Bug 507797 – Graphs taking too much cpu

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 507797 - Graphs taking too much cpu


Summary:	Graphs taking too much cpu


Status:	RESOLVED FIXED

Product:	system-monitor
Classification:	Core
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	High critical
Target Milestone:	---
Assigned To:	Karl Lattimer
QA Contact:	System-monitor maintainers

URL:
Whiteboard:

Duplicates:	528026 540542 (view as bug list)
Depends on:
Blocks:

Reported:	2008-01-07 09:06 UTC by Michael Monreal
Modified:	2011-11-11 10:03 UTC

See Also:
GNOME target:	---
GNOME version:	2.21/2.22

Attachments
Sysprof (391.34 KB, application/octet-stream) 2008-01-08 22:01 UTC, Benoît Dejean		Details
patch to optimise graph drawing (8.59 KB, patch) 2008-01-11 16:29 UTC, Toby Dacre	reviewed	Details \| Review
now without moving load_graph_configure() (7.28 KB, patch) 2008-01-11 17:07 UTC, Toby Dacre	committed	Details \| Review
wierd test case (681 bytes, patch) 2008-01-12 01:13 UTC, Toby Dacre	none	Details \| Review
nicer frame limits (1.44 KB, patch) 2008-01-22 12:13 UTC, Toby Dacre	none	Details \| Review
Patch that tries to avoid cairo compositing (18.72 KB, patch) 2008-04-12 21:43 UTC, Jörgen Scheibengruber	none	Details \| Review
notgood (75.31 KB, image/png) 2008-04-16 19:25 UTC, Samuel Buffet		Details
notgoodatall (118.45 KB, image/png) 2008-04-16 19:26 UTC, Samuel Buffet		Details
trunk (115.07 KB, image/png) 2008-04-17 11:55 UTC, Samuel Buffet		Details
slightly improved version of jorgen's patch (17.62 KB, patch) 2008-04-17 19:12 UTC, Karl Lattimer	reviewed	Details \| Review
much improved patch (13.79 KB, patch) 2008-04-18 12:36 UTC, Karl Lattimer	none	Details \| Review
another alternative (8.40 KB, patch) 2008-04-18 14:08 UTC, Toby Dacre	none	Details \| Review
merged patches of mine and tobys (13.64 KB, patch) 2008-04-19 07:34 UTC, Karl Lattimer	none	Details \| Review
added improved graph line position calculation (13.64 KB, patch) 2008-04-19 11:32 UTC, Karl Lattimer	none	Details \| Review
Fixed position... again! (13.66 KB, patch) 2008-04-19 14:28 UTC, Karl Lattimer	none	Details \| Review
and again! (13.80 KB, patch) 2008-04-19 16:26 UTC, Karl Lattimer	committed	Details \| Review

Description Michael Monreal 2008-01-07 09:06:05 UTC

When going to the memory usage tab, I always see one CPU go to 50% and stay there. This seems to be caused by the new graphs... which makes the whole thing a bit pointless :(

Comment 1 Benoît Dejean 2008-01-07 09:11:39 UTC

What kind of CPU do you have and which refresh rate are you using ?

Comment 2 Michael Monreal 2008-01-07 09:23:33 UTC

Interval is set to 1sec (which seems to be the default). My cpu is a core 2 duo 6600 (2 * 2.4Ghz). Setting the graph interval to 2sec reduces the effect, but still one of the cores stays at 30%.

Comment 3 Benoît Dejean 2008-01-07 10:56:32 UTC

OK. With 2.20, what CPU usage do you get ?
Also, have a look whether or not you CPU is running at full speed when testing.

Comment 4 Toby Dacre 2008-01-07 11:09:07 UTC

If this is the svn version then it's related to bug #328101 and should be being worked on.

Comment 5 Benoît Dejean 2008-01-08 17:48:28 UTC

I'll try to bring a sysprof profile ASAP.

Comment 6 Michael Monreal 2008-01-08 17:59:15 UTC

Toby: yes, the problem only happens in the trunk version. At least, I have never noticed something like this with older (stable) versions

Comment 7 Benoît Dejean 2008-01-08 22:01:37 UTC

Created attachment 102424 [details]
Sysprof

I've tested in a vmware and X takes a lot of CPU. Could please check on your system with top that while you run the resource tab ?

Comment 8 Michael Monreal 2008-01-08 22:13:47 UTC

Right...

 4999 root      15   0 93536  56m  10m S   57  2.8   5:32.62 Xorg

but why?

Comment 9 Toby Dacre 2008-01-09 00:20:39 UTC

There are lots of optimisation of the graph drawing that can be done.  This can be fixed before release (even if some of the new graph goodness is held back).  I've seen quite a few areas that can be looked at and hopefully each can save some CPU.  I'm holding off as it's assigned to Karl and I don't want to step on his toes.

My view is that <10% CPU on a 1GHz machine @ 1 second update should be a target.

Question:  Does cairo offload some of it's processing to Xorg?

Comment 10 Benoît Dejean 2008-01-09 08:58:47 UTC

(In reply to comment #9)

> My view is that <10% CPU on a 1GHz machine @ 1 second update should be a
> target.
> 
> Question:  Does cairo offload some of it's processing to Xorg?
> 

It's very likely. On my ibook, i haven't noticed any performance change. The overall CPU usage is still ~25% (X 15%-20%) with 1s refresh rate. But on a vmware with a default vesa driver, it's 100% CPU usage.

Comment 11 Toby Dacre 2008-01-11 16:29:57 UTC

Created attachment 102599 [details] [review]
patch to optimise graph drawing

OK, I've been looking at the graph drawing and what is slowing it down.

This patch reduces the CPU usage in the Resources tab by about 90% on my system.  I can now update the graphs every 0.25 second with CPU at ~25%.  Before the patch CPU was at 100% with 1 second updates.  It is now ~7% on 1 second update.

changes

* FRAMES constant has been replaced by g->frames_per_unit.  This allows the number of frames to be dynamic.  We only have as many frames as we need.

* render_counter changed from gint to guint to stop unneeded conversion.

* delx, dely, real_draw_height now are only calculated when the background is drawn not every redraw.  They are now stored in g->graph_delx, g->graph_dely, g->real_draw_height.  Much of patch is just these being changed.

* g->graph_buffer_offset to store (int) (1.5 * g->graph_delx) + FRAME_WIDTH  which was been calculated on each redraw. This is the biggest win.  I think mainly because we now give cairo an int.

* the background colour is now set in the g->background_buffer not each redraw.  The buffer size is slightly taller to cover the whole 'canvas'.

* removed unneeded (g->background_buffer == NULL) check.

* load_graph_configure() has been moved as it now references load_graph_update()

* load_graph_configure() now recalculates g->frames_per_unit on resize and adjusts the timeout.

* This also fixes the problem with the timings.  This means the bottom scale on the graphs is now pretty accurate.

Sorry it is a bit long but I think the performance improvement makes it worth the time needed to review it.

I'll add a note to Bug 328101

Comment 12 Karl Lattimer 2008-01-11 16:46:58 UTC

Toby, thanks for the patch firstly, I see pretty much how it works however I would really prefer if you would sort out the load graph configure changes so it doesn't appear the whole function is removed and re-added.

This should be a matter of swapping a function placement and should reduce the size of the patch dramatically... 

Then I'll review the replacement patch and probably commit it if it follows exactly what my thinking on this is.

Thanks again, it looks great but I want to double check that first :)

Comment 13 Toby Dacre 2008-01-11 16:51:33 UTC

OK,

I have a plan to do that.  Give me 20mins

Comment 14 Toby Dacre 2008-01-11 17:07:05 UTC

Created attachment 102603 [details] [review]
now without moving load_graph_configure()

OK,

Here we go I've put a predeclaration in so that I don't need to move it now.

Hope this is clearer :)

Comment 15 Karl Lattimer 2008-01-11 17:17:08 UTC

This patch seems to do the job! 

Although I had to fix an off by one error :P

Comment 16 Toby Dacre 2008-01-11 17:28:11 UTC

> Although I had to fix an off by one error :P

And you expect me to look through that whole patch and find it!

Comment 17 Karl Lattimer 2008-01-11 17:32:05 UTC

Hey you expect me to look through your patches ! :)

if (g->render_counter >= g->frames_per_unit)

That was it, with the > sign in there it rolled back a frame once in a while. I had exactly the same problem when I had my first punt at the smooth scroll effect :D

Comment 18 Toby Dacre 2008-01-11 17:43:39 UTC

lol, That's good I wasn't sure if the 'jump' was caused by all the
extra rendering work.

The vision is almost complete :)

Comment 19 Benoît Dejean 2008-01-11 19:21:16 UTC

Toby, your patch doesn't remove any cairo call. Saving a few mult here and there is OK, but i can't see how it can saves CPU cycle on the X server. Moreover, except the real_draw_heigh, i don't see any real improvement that the compiler would not do by itself.

The only real thing is that you moved this:
+	// set the background colour
+	GtkStyle *style = gtk_widget_get_style (g->notebook);
+	gdk_cairo_set_source_color (tmp_cr, &style->bg[GTK_STATE_NORMAL]);
+	cairo_paint (tmp_cr);

The speedup is much more likely to come from that. Anyone can test this single patch and confirm ? (/me has serious business to do (like play Supreme Commander) ;)

The frames_per_unit patch is unrelated. It's a good idea (but doesn't make any difference) but it should be splitted.

This needs real benchmarking.

What i did once was KISS. Add a counter, counter++ on each iteration, exit() when counter = 1000. Then run /usr/bin/time ./gnome-system-monitor. and compare.

Your patch is nice, but it would be better to split everything and carefully benchmark each part, see if the speedup worths the complexity added or not.

Comment 20 Benoît Dejean 2008-01-11 19:34:23 UTC

(In reply to comment #14)
> Created an attachment (id=102603) [edit]
> now without moving load_graph_configure()
> 
> OK,
> 
> Here we go I've put a predeclaration in so that I don't need to move it now.
> 
> Hope this is clearer :)
> 

I don't see any CPU usage improvment at all with this patch (which is now commited).

Where's the off by 1 error ?

(Toby, be careful with long lines, we "try" to stick to 80c)

Comment 21 Toby Dacre 2008-01-11 19:43:25 UTC

(In reply to comment #20)


> I don't see any CPU usage improvment at all with this patch (which is now
> commited).
> 

Are you using hardware acceleration or in a vm?  I think this may be important.  I'm using nvidia

For me the improvement is very dramatic over 90%

As I said for me the big improvement was giving cairo an int for it's transformation

* g->graph_buffer_offset to store (int) (1.5 * g->graph_delx) + FRAME_WIDTH 
which was been calculated on each redraw. This is the biggest win.  I think
mainly because we now give cairo an int.


> Where's the off by 1 error ?
> 

see above comments

> (Toby, be careful with long lines, we "try" to stick to 80c)
> 

OK,  I was really commenting the patch (not for adding)

Comment 22 Benoît Dejean 2008-01-11 19:49:56 UTC

(In reply to comment #21)
> (In reply to comment #20)
> 
> 
> > I don't see any CPU usage improvment at all with this patch (which is now
> > commited).
> > 
> 
> Are you using hardware acceleration or in a vm?  I think this may be important.
>  I'm using nvidia

No, driver radeon on my ibook.

> For me the improvement is very dramatic over 90%

Where ? On the X process or system-monitor ?

> As I said for me the big improvement was giving cairo an int for it's
> transformation
> 
> * g->graph_buffer_offset to store (int) (1.5 * g->graph_delx) + FRAME_WIDTH 
> which was been calculated on each redraw.

Come on, that's absolutely nothing to save. And that can't save CPU on the X server.

> This is the biggest win.  I think
> mainly because we now give cairo an int.

That it's a round value, maybe, but i get nothing here. nothing at all.
Saving some cast doesn't bring any boost. There's not gint->guint, it's noop. I'm somehow saying that i'm unhappy with the patch because you claim that you save 90% of CPU, but you don't say where and how.

Comment 23 Toby Dacre 2008-01-11 20:51:28 UTC

This is not very scientific but here is the results of top

rev 2263

 5270 root      25   0  130m  46m 9856 R 97.9  4.6  74:41.56 Xorg               
 5657 toby      15   0 86548  32m  10m S  0.7  3.2   8:08.80 compiz.real        
17438 toby      15   0 40596  17m  12m S  0.7  1.7   0:00.51 gnome-system-mo    
 5037 haldaemo  16   0  3260 1188 1036 S  0.3  0.1   0:06.67 hald-addon-stor    
    1 root      15   0  2952 1856  532 S  0.0  0.2   0:01.18 init               
    2 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd           
    3 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0        
    4 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0        
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0    

rev 2272

 5270 root      15   0  129m  45m 9860 S  8.0  4.5  75:06.40 Xorg               
 5657 toby      15   0 86552  32m  10m S  3.3  3.2   8:10.53 compiz.real        
19722 toby      15   0 40596  17m  12m S  1.3  1.7   0:00.46 gnome-system-mo    
  184 root      15   0     0    0    0 S  0.3  0.0   0:00.87 pdflush            
17437 toby      15   0  2368 1172  876 R  0.3  0.1   0:00.24 top                
    1 root      15   0  2952 1856  532 S  0.0  0.2   0:01.18 init         


These are with all the same settings etc

Now g-s-m is using about the same CPU in both, but for whatever reason X is not getting pulped in rev 2272.

I've have played about with what looked like it would make a difference for me and the patch makes g-s-m work for me and it is a constant very noticeable one.

I'm happy to look at what's made the difference but I'm no expert and don't claim to be. The CPU problem seems to be happening within X.  Will  /usr/bin/time ./gnome-system-monitor show a difference?

Maybe this only helps nvidia systems? I don't know.

Comment 24 Karl Lattimer 2008-01-11 21:19:12 UTC

i've been running a few sysprof tests and I can see that in the latest revision its spending around 50% cpu time in cairo and in the previous revision its spending about 60-70% in cairo. 

Sysprof will only let me go so far, so I think I need to install some debuginfo files to get a deeper look but I think we can still reduce this plenty.

Comment 25 Benoît Dejean 2008-01-11 21:39:43 UTC

(In reply to comment #24)
> i've been running a few sysprof tests and I can see that in the latest revision
> its spending around 50% cpu time in cairo and in the previous revision its
> spending about 60-70% in cairo. 
> 
> Sysprof will only let me go so far, so I think I need to install some debuginfo
> files to get a deeper look but I think we can still reduce this plenty. 
> 

As you can see, it's about X CPU times. So the few savings of Toby's patch are irrelevant. What is needed is to revert and find what exactly brings a performance improvement on some computers and why.

Comment 26 Karl Lattimer 2008-01-11 22:25:30 UTC

It mostly comes down to cairo performance in X, you see on my machine the X process is guzzling cpu not gsm. So its all about the graphics drivers, nvidia perform best overall, intel have smooth performance but it sometimes uses a lot of cpu and amd is lagging behind somewhat at the minute...

Comment 27 Benoît Dejean 2008-01-11 22:44:02 UTC

Yes but that doesn't explain the improvement. If we can't say where is the improvement, one day we'll revert it without even knowing.

Comment 28 Toby Dacre 2008-01-12 01:13:15 UTC

Created attachment 102638 [details] [review]
wierd test case

reverting to version 2263
and then applying this test case saves most of cpu in X for me (I'm sorry it is not better quantified)

The patch makes the calculation in cairo_set_source_surface() in some way easier for  X/cairo.  This would seem to be a good place to probe at stuff.  The calculations result is approximately correct so I don't think it is this that makes the difference.

Yes looking at the patch it makes no sense as more work is being done.  The calculation still happens and even gets output as a bonus.  This implies to me it is something with cairo.

I hope this helps track this down.

Comment 29 Toby Dacre 2008-01-12 01:15:49 UTC

Sorry reverting to 2265

Comment 30 Toby Dacre 2008-01-12 13:57:12 UTC

OK, an easier test case.

changing the one line from

	cairo_set_source_surface (cr, g->graph_buffer, -1*((g->render_counter) * (delx/FRAMES)) + (1.5*delx) + FRAME_WIDTH, FRAME_WIDTH);

to 

	cairo_set_source_surface (cr, g->graph_buffer, (int)(-1*((g->render_counter) * (delx/FRAMES)) + (1.5*delx) + FRAME_WIDTH), FRAME_WIDTH);

That is changing the cast to an int makes all the difference for me.  I'm think it is related to clipping or something that has similar behaviour.  Even if not called directly by g-s-m.  Maybe we are using different cairo versions and this 'bug' behaves differently.  I'm using libcairo2 1.4.10-1ubuntu4.4.


From http://cairographics.org/FAQ/#performance_concerns

The cairo_clip function can be used for two different operations. The first is to restrict the set of pixels that need to be updated, (imagine needing to draw only half of a window that was just uncovered by another window). While the second is to modify what is being drawn with some *non-pixel-aligned* shape (imagine a circular clip path, for example).

These two uses result in *very different* code paths inside cairo and in the underlying window system. The first case can result in *faster* performance compared to unclipped drawing, since the same drawing operations can be performed but fewer pixels are updated. The second case can actually result in *slower* performance as an extra compositing step must be added to get the clipped result. This is because the pixels on the edge of the mask will have different values than they would in the unclipped case, (a non-pixel-aligned clip path results in antialiased clipping).

Comment 31 Jean-François Fortin Tam 2008-01-20 21:41:23 UTC

I assume you folks are aware of this already, but g-s-m from trunk is terrible on any computer without acceleration. A fairly recent laptop I have here has an ATI gpu in it, running in vesa mode. It hits 100% CPU usage, whereas this was not the case when looking at graphs in g-s-m 2.20.

On another computer with an nvidia card & compiz, it works fine. Just saying that no matter how pretty this is, you'll need to take into account computers that do not have acceleration if you don't want a flurry of bug reports for gnome 2.22 :)

Comment 32 Toby Dacre 2008-01-21 01:35:21 UTC

This seems to be a video driver issue (as suggested by cairo developers).  I get good results with nvidia (closed) but with nv things are really bad, testing on the same machine.  Things are so bad that even the 5 frame limit is way too much.  Performance at 1 frame limit is higher than I'd like but probably ~ what it was in 2.20 I still need to do a comparison.

I also had other problems with the nv driver at high frame rates/update times when Xorg was using almost all of my cpu.  The colour background for the graph surface became the wrong colour on some resizing of g-s-m but inconsistently.

It appears to be a problem with render when there is no acceleration.  We should minimise the use of rendering on machines with no acceleration. On machines that have acceleration we should give them the nice smoother graph experience.  The limit of 5 seems to penalise both parties.  For non accelerated their should be a limit of 1 as anymore will have a major effect.  For those with acceleration this becomes almost a non-issue - drawing the graph curve seems to become the limiting factor.

How about detect the video driver and make a decision?  I'm not sure how easy this would be to test or how workable it would be.

The other option might be a config setting somewhere to allow for enabling 'acceleration' or there might even be one we can use.

Comment 33 Karl Lattimer 2008-01-21 09:14:17 UTC

Toby if you could write the patch I asked for so we can PHYSICALLY set the FRAMES again to whatever we want this problem goes away, but your patch is already applied and convoluted around the frame timing. Please fix it so I can add it to SVN.

The performance issue isn't something we can fix so we switch it off until it IS fixed.

Comment 34 Toby Dacre 2008-01-21 18:43:05 UTC

Karl

I've done the patch but want to test it a little more.

Comment 35 Toby Dacre 2008-01-22 12:13:27 UTC

Created attachment 103423 [details] [review]
nicer  frame limits

patch introduces MAX_FRAMES that can be used to set how many frames get used or 0 to disable frame limit.

Fixed NUM_POINTS problem back to 60 seconds on scale

MAX_FRAMES set to 5 but I think this is a bad choice see comment #34.

Comment 36 Thomas Wood 2008-03-24 00:20:43 UTC

Not quite sure what the outcome of this discussion has been, but I seem to get 100% CPU usage when observing the graphs on the 2.22.0 released version.

Comment 37 Karl Lattimer 2008-03-25 13:38:57 UTC

Hi Thomas,

I'm a little confused here... I don't know what hardware is causing the problem for you, VMs are apparently particularly poor? This is a really anoying problem, but I did fix a bunch of issues related to this in SVN (frame timing, limiting to 5fps etc...) Now I doubt that a few lines with 60 points, which is rendered to a buffer and composited together 5 times per second can be causing the issue it must be something in X, most likely a driver issue...

Comment 38 Thomas Wood 2008-03-25 13:54:33 UTC

This is running on a iBook G4 (PPC, 800MHz) using the radeon driver supplied with X.org.

To be honest, I think the only way of "fixing" this bug would be to provide an alternative version of the graphs that did not use so many resources.

I did try changing the update interval option of the graphs to 1 (i.e. once a second), but it didn't seem to have any effect on the speed of the graph movement.

Comment 39 Benoît Dejean 2008-03-25 14:02:31 UTC

My main computer is a G4, here's my X config

Section "Device"
        Identifier      "ATI Radeon Mobility 9200 M9+"
        Driver          "radeon"
        Option          "AGPMode" "4"
        Option          "UseFBDev" "true"
        Option          "EnablePageFlip" "true"
        Option          "AGPFastWrite" "true"
        Option          "AccelMethod" "EXA"
        #Option         "XaaNoOffscreenPixmaps"
EndSection


The graph don't use much resource, it's basic cairo usage. The actual CPU problem is on the X side on some configs, and we don't know why.

Comment 40 Karl Lattimer 2008-03-25 14:17:55 UTC

If the old system monitor uses ~5% CPU time, then the new one should only use ~ 5 * 5 (%) CPU time...

In fact it should use less than 5*5 as the background is drawn less frequently... I think this bug may be in compositing surfaces rather than in the drawing...

Comment 41 Michael Meeks 2008-03-28 17:21:49 UTC

I have the same problem with Ati - an unusuable system when rendering the graphs :-) I imagine we need to be pulling more tricks to render less; but having not read the code will refrain from commenting. Having said that it's really tough to optimise successfully for a situation you can't reproduce :-)

Comment 42 Alex Shulgin 2008-04-02 08:22:44 UTC

Hi,

I'm experiencing the similar problem (100% CPU usage) on my Debian lenny box with amd64 CPU.

Here's some details:

I use proprietary nVidia driver: NVIDIA-Linux-x86_64-169.12

In xorg.conf:

Section "Device"
    Identifier     "nVidia Corporation NV43 [GeForce 6600 GT]"
    Driver         "nvidia"
EndSection

Here's what I see in XOrg.log:
(II) Module nvidia: vendor="NVIDIA Corporation"
        compiled for 4.0.2, module version = 1.0.0
        Module class: X.Org Video Driver

"cat /proc/version" gives:
Linux version 2.6.22 (2.6.22-6.lenny1) (root@zrbite) (gcc version
4.2.3 (Debian 4.2.3-2)) #1 SMP Fri Mar 14 11:21:44 EET 2008

The driver was compiled with the same gcc version as the kernel.

I've filed a bug originally at Debian BTS, here's the link: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=472777

--
Cheers,
Alex

Comment 43 Marcus Isaksson 2008-04-10 07:23:09 UTC

I experience the same problem (on Intel 945, Ubuntu Hardy).

Some profiling gave that most time is spent in the function fbFetchTransformed_Bilinear_General in pixman. This function copies pixel after pixel while computing transformations and doing tons of checks for every pixel.
This of course is a bit excessive since our transformation is just a horisontal translation by an integral number of pixels.

Actually, just doing
- cairo_set_source_surface (cr, g->graph_buffer, g->graph_buffer_offset - tmp, FRAME_WIDTH);
+ cairo_set_source_surface (cr, g->graph_buffer, 0, 0);

in loadgraph.cpp makes a threefold performance improvement (50%->15% on my system). (but then the graph doesn't move)

Similar performance improvements can be achieved by redrawing the graph every time (thus avoiding the translation), but that might hurt the performance on better accelerated graphic cards.

Comment 44 Benoît Dejean 2008-04-10 08:24:55 UTC

Well, the whole point was to make the new graph slide where the old graph would jump on each refresh.

Comment 45 Karl Lattimer 2008-04-10 08:35:49 UTC

Ok, I'd like to make some important points;

Hardy != better

if the intel driver is doing that when it should be blitting the surfaces together then there is something drasticly wrong with the intel driver. Which is a shame because it's now open source (or at least the one I'm using is, I don't know if there are hacks in hardy).

However, i seem to remember EXA is now fast as hell on the 945gm, well was the last time i booted my macbook into linux, so chances are you're using XXA

EXA should have become the default in hardy, why not go and test EXA and file a nice bug report on launchpad about it

Comment 46 Karl Lattimer 2008-04-10 08:38:42 UTC

> Section "Device"
>     Identifier     "nVidia Corporation NV43 [GeForce 6600 GT]"
>     Driver         "nvidia"
> EndSection
> 

With nvidia, it is always best practice to ensure you have the render extension enabled, and fast writes, and side band addressing and lots of other tweaks, the default config isn't very good.

Comment 47 Michael Meeks 2008-04-10 12:19:32 UTC

Can we sacrifice the concept of static grid-lines, and just do XCopyArea's (and scroll the grid-lines too), this rendering only a few, new pixels each time (?) Failing that, can we render the graph with XCopyArea, and draw the grid-lines over the top afterwards ?

Comment 48 Karl Lattimer 2008-04-10 12:55:29 UTC

We already only draw the grid lines once every time configure is called, which is very rarely as we only ever need to draw the grid lines when the window is resized.

We cache that surface and use cairo_set_source_surface to then draw it out to the context when we want to redraw the graph.

So effectively, two surfaces are being created then composited on top of one another, afaik using a buffered surface does not redraw the lines every time, and painting two rectangles ontop of one another using cached surfaces shouldn't need to draw each pixel in turn.

Comment 49 Alexander “weej” Jones 2008-04-10 16:21:26 UTC

I notice that as I resize the monitor view, the background grid resizes way before the foreground resizes.

In fact, even after the resize, the monitor continues to draw the line in its old geometry for quite a considerable time before moving.

This leads me to believe we are saturating the X server with drawing requests, and that they are queuing up on the server behind time.

sync()ing the display in the widget expose() function would normally repair this issue for similar problems, as it prevents GTK from being able to send more drawing requests before the last set have finished and thus removes any possibility of queueing. However, because of this exotic method of drawing the foreground and background separately, I was unable to figure out how to do this in your code.

Comment 50 Karl Lattimer 2008-04-10 16:27:05 UTC

(In reply to comment #49)
> I notice that as I resize the monitor view, the background grid resizes way
> before the foreground resizes.
> 
> In fact, even after the resize, the monitor continues to draw the line in its
> old geometry for quite a considerable time before moving.
> 
> This leads me to believe we are saturating the X server with drawing requests,
> and that they are queuing up on the server behind time.

Before making outlandish statements, read the code.

We render the data lines once per sample, we sample the data once per second by default, in between times we slide the upper surface (data) over the lower surface (graph lines).

What is actually needed is to have the data lines re-render with existing data on configure, this would put a little more pressure on the CPU when resizing the window (but this isn't something you do very often). This is purely cosmetic.

So we're not flooding the X server with drawing requests at all, we're not actually drawing anything but the graph lines in this period of hiccup.

Comment 51 Alexander “weej” Jones 2008-04-10 17:03:09 UTC

Thanks for the explanation, Karl.

Why does it need to be this complicated? It's just a bunch of lines, after
all.

Comment 52 Karl Lattimer 2008-04-10 17:21:04 UTC

Why exactly is this complicated? If you look at the implementation you'll see that we draw the graphs using standard cairo functions, and we slap them together in a particularly simple way.

Take a look at; http://svn.gnome.org/svn/gnome-system-monitor/trunk/src/load-graph.cpp 

at the moment the graph uses 5 frames per sample (the default sample time being 1 second) the draw_background and load_graph_draw functions are pretty simple, with a very simple counter in load_graph_update to allow us to draw 4 frames based with a cached buffer, and 1 frame with new data caching and shuffling as we go, we have two data points which are not rendered, by means of clipping to fill the graph, so the graph is delayed by a maximum of one sample or less during the sample period.

its all flexible enough for us to define how many divisions or frames per sample, so basically it'll always shuffle along smoothly as we can make it, in the future we would of course want to arrange it so it retains its shape but the number of points increases as the graph gets wider and still draws a happy 20-30 fps... This is all based on the hope that one day cairo will be fast as hell!

Comment 53 Jörgen Scheibengruber 2008-04-10 18:05:19 UTC

Disclaimer: I haven't looked at the code very long, so take my comment with a grain of salt:

Of the top of my head I would say is what you want to get rid of here is drawing 62 cubic bezier splines (cairo_curve_to) every time you take a new sample. I have not used cairo a lot before, but I would expect that to be pretty expensive. Why not just copy the backbuffer and only draw the spline for the last sample?

If i understood Michael correctly, that was his suggestion, too?

Comment 54 Alexander “weej” Jones 2008-04-10 18:09:23 UTC

Or... Just draw straight lines. The bezier's are misrepresetative at the least. At least everyone knows what they're getting with a straight line.

Comment 55 Karl Lattimer 2008-04-10 18:31:41 UTC

@alex: misrepresentative how? I mean does a processor always have a linear falloff between two specific loads within an arbitrary time frame? The bezier's look nicer and less like windows' horrific unantialiased unimaginative task manager. 

@Jorgen: The data lines are only being drawn once per sample, this is EXACTLY the same as 2.22. This is definitely NOT what is causing the problem, there is no added drawing over 2.22 we simply combine the pre-rendered surfaces more often.

If it were the case that this was creating a 100% CPU load on some drivers then I'd expect the user would have experienced this exact same behaviour in 2.22

Maybe we could work out some magic so we only draw two points (clipping appropriately) to the cached surface every time we get a new sample, however this won't be a big win.

Something screwy is definitely going on in the driver...

Comment 56 Karl Lattimer 2008-04-10 18:34:28 UTC

If we don't get a reasonable answer/hack/work around to this bug soon I'm thinking it would probably be best to close it as NOTGNOME and farm it off to the fd.o bugzilla, it seems to be related to specific configurations of specific graphics cards, which is not something GNOME has the resources to cope with.

Comment 57 Alex Shulgin 2008-04-10 19:48:28 UTC

(In reply to comment #46)
> > Section "Device"
> >     Identifier     "nVidia Corporation NV43 [GeForce 6600 GT]"
> >     Driver         "nvidia"
> > EndSection
> > 
> 
> With nvidia, it is always best practice to ensure you have the render extension
> enabled, and fast writes, and side band addressing and lots of other tweaks,
> the default config isn't very good.

Hmmm... fast writes (whatever it is) is indeed not enabled on my box:

$ cat /proc/driver/nvidia/agp/status 
Status: 	 Enabled
Driver: 	 AGPGART
AGP Rate: 	 8x
Fast Writes: 	 Disabled
SBA: 		 Enabled

But I cannot seem to find how to enable it.  Adding `Option "AGPFastWrite" "true"' much suggested on the web doesn't help, since it's ignored by the NVIDIA driver:

(WW) NVIDIA(0): Option "AGPFastWrite" is not used

Can anyone help me on this?

--
Thanks,
Alex

Comment 58 Karl Lattimer 2008-04-10 19:50:31 UTC

try adding "optons nvidia NVreg_EnableAGPSB=1 NVreg_EnableAGPFW=1" to /etc/modprobe.conf

Comment 59 Michael Monreal 2008-04-10 19:56:26 UTC

Fast writes is not supported on all chipsets and even if it works it may make the system unstable, be warned. I don't think this is the right solution here anyway.

Comment 60 Martin Ejdestig 2008-04-10 20:25:44 UTC

(In reply to comment #59)
> Fast writes is not supported on all chipsets and even if it works it may make
> the system unstable, be warned. I don't think this is the right solution here
> anyway.

Well, workarounds for buggy or slow drivers is certainly NOT the right solution. (How many FPS do these chipsets churn out when running e.g. Doom 3? ;)

Comment 61 Alex Shulgin 2008-04-10 20:28:44 UTC

(In reply to comment #59)
> Fast writes is not supported on all chipsets and even if it works it may make
> the system unstable, be warned. I don't think this is the right solution here
> anyway.

Indeed, it doesn't help even if fast writes enabled.

Thanks for suggestion anyway. :-)

Comment 62 Jörgen Scheibengruber 2008-04-10 21:07:20 UTC

DISCLAIMER: again, i am not really well-informed, but i cannot refrain from commenting ;-)

If the bug is really in the driver, a workaround would be to apply double-buffering, i.e. do all cairo ops client side on a CAIRO_SURFACE_TYPE_IMAGE and then pushing that one to the server using gdk/xlib. That would probably decrease performance for systems with proper acceleration slightly, but it should fix the issue for broken drivers.

Another approach would be to render the two surfaces into different GdkDrawables and then compose them using gdk_draw_drawable. That way all ops would still have to the chance of beign accelerated, since executed on the server, but cairo is eliminated from the equation ;-) If the driver is seriously broken, that won't help a lot of course.

Comment 63 Jörgen Scheibengruber 2008-04-10 21:10:34 UTC

(eh just to make sure, my second suggestion is to not do the compositing using cairo, not to not use cairo at all, i.e. i am not suggesting to get rid of the fancy antialiased splines)

Comment 64 Martin Ejdestig 2008-04-10 21:12:43 UTC

Workarounds have a tendency to prolong the time until a proper fix is done. But it's not a black and white issue. Deciding for or against doing a workaround also depends on how serious the issue is. I doubt not being able to view graphs in the system monitor is a big deal for most users... So, if it really is a driver issue, I say mark it NOTGNOME and give feedback to the driver developers.

Comment 65 Toby Dacre 2008-04-10 21:13:42 UTC

(In reply to comment #53)

> Of the top of my head I would say is what you want to get rid of here is
> drawing 62 cubic bezier splines (cairo_curve_to) every time you take a new
> sample.

As Karl says this might make a small difference.  What seems to be the problem is compositing the surface.  More likely it is XRender that is the ultimate cause.

Things we should do

* There is a performance issue we need to find a way of minimising it. 

* Make sure upstream are aware of the problem if they are the cause.

* For machines experiencing this problem we should offer a way to revert to the one frame per second model (if we can't fix it).

We should _not_ close the bug because we feel it's someone else's problem (drivers or cairo) and leave users with a worse experience.  The smooth graph is nice if it works but it is _not_ everything g-s-m is about.  I want to use g-s-m to find out what is happening on my system not for it to use all my resources and look pretty.

Comment 66 Toby Dacre 2008-04-10 21:18:57 UTC

(In reply to comment #64)
> I doubt not being able to view graphs
> in the system monitor is a big deal for most users... So, if it really is a
> driver issue, I say mark it NOTGNOME and give feedback to the driver
> developers.
> 

Are you saying ditch the smooth scrolling until we can get it to work?

This is not being able to see the graphs.  It is being able to use g-s-m without it eating huge amount of cpu.  That is not very user friendly ;)

Comment 67 Martin Ejdestig 2008-04-10 21:23:58 UTC

(In reply to comment #65)
> * For machines experiencing this problem we should offer a way to revert to the
> one frame per second model (if we can't fix it).

So what do you suggest? A "I have a buggy driver" option? Not going to work.

Comment 68 Martin Ejdestig 2008-04-10 21:32:23 UTC

(In reply to comment #66)
> Are you saying ditch the smooth scrolling until we can get it to work?
> 
> This is not being able to see the graphs.  It is being able to use g-s-m
> without it eating huge amount of cpu.  That is not very user friendly ;)

No, I'm saying keep it as is. Viewing graphs in GSM is not a vital function (looking at the desktop as a whole). Most people can live without it. But if it exposes a driver issue it will presumably help speed up a proper fix.

(More apps need to do "effects" like this. The timing seems right, at least when it comes to the open source drivers...)

Comment 69 Toby Dacre 2008-04-10 21:43:32 UTC

(In reply to comment #67)
> (In reply to comment #65)
> > * For machines experiencing this problem we should offer a way to revert to the
> > one frame per second model (if we can't fix it).
> 
> So what do you suggest? A "I have a buggy driver" option? Not going to work.
> 

Yes.  If the rendering is accelerated we can move the number of frames per second way up and have a really smooth graph.  As I've set before the limit of 5 hurts everyone.

I do see you point about NOTGNOME but I'm not yet convinced. I have this problem so I'm affected.

Comment 70 Marcus Isaksson 2008-04-10 21:51:58 UTC

I just found out that the suggestion in comment #30 actually fixes the performance issues in Cairo that I describe in comment #43. 
Simply making sure that we scroll the graph an integral number of pixels decreases the cpu usage of X(cairo) with a factor of 3 or so, since Cairo no longer need to do bilinear interpolation.

Maybe we could at least try that. It's just five characters of code...

Comment 71 Toby Dacre 2008-04-10 22:07:48 UTC

--- /home/toby/svn2/trunk/src/load-graph.cpp    (revision 2380)
+++ /home/toby/svn2/trunk/src/load-graph.cpp    (working copy)
@@ -223,7 +223,7 @@
        tmp = (float)(g->draw_width - g->rmargin - g->indent) / (float)LoadGraph::NUM_POINTS;
        tmp = tmp / g->frames_per_unit;
        tmp = tmp * g->render_counter;
-       cairo_set_source_surface (cr, g->graph_buffer, g->graph_buffer_offset - tmp, FRAME_WIDTH);
+       cairo_set_source_surface (cr, g->graph_buffer, (int)( g->graph_buffer_offset - tmp),FRAME_WIDTH);
        cairo_rectangle (cr, g->rmargin + g->indent + FRAME_WIDTH + 1, FRAME_WIDTH - 1,
                         g->draw_width - g->rmargin - g->indent - 1 , g->real_draw_height + FRAME_WIDTH - 1);
        cairo_fill (cr);




Improve things a little but there is still something else that does need fixing.

Comment 72 Jörgen Scheibengruber 2008-04-10 22:17:21 UTC

Wow, cairo is pretty great :-) Is it really interpolating the two surfaces? And this only makes a difference of factor 3?
I don*t think that this should be a question. Interpolating the buffers is clearly not was the code was supposed to do, and it wont do any good either, I would expect the image quality to rather improve without interpolation...

Comment 73 Michael Meeks 2008-04-11 14:59:10 UTC

I look forward to testing it :-) In my experience, the slow bits in XRender are the compositing operations - ie. on some systems it is lightning fast to composite an alpha bitmap on a background; and in a large number of others it's -impossibly- slow :-)

On that basis - the optimization of drawing the grid-lines only once & compositing it on top (if I understand you rightly) would be (in fact) slower than simply re-rendering the grid-lines each time, over the top:

After all, the grid-lines look nicely de-generate; stippled lines, 1 pixel wide on integer offsets (which is great). If it's not lightning fast with cairo, it can be done with gdk [ I guess ].

Comment 74 Jörgen Scheibengruber 2008-04-12 21:43:53 UTC

Created attachment 109147 [details] [review]
Patch that tries to avoid cairo compositing

I just hacked up a quick and dirty patch that does the following:
- It caches only the background, but uses a GdkPixmap for caching instead of a cairo-surface
- The graph will be drawn directly onto the screen (no caching happening it's redrawn always)
- All three graphs are redrawn from one timeout_cb instead of each having its own

This is really just a quick'n'dirty test. Most notably it will break the graph-drawing a bit. But I see no sense in polishing it up, unless it actually fixes any issue ;-)

So if people that experience this bug could try this out?

Comment 75 Alexander “weej” Jones 2008-04-13 02:23:23 UTC

Yes, it fixes the terrible performance now. CPU usage still pretty high, though.

Comment 76 Karl Lattimer 2008-04-13 05:30:41 UTC

Re: Comment #71, this patch is nonsense, the graph needs to move by a float each time as the distance moved by a single frame is always 

(background width / number of samples) / number of frames per sample

Which will only rarely by an integer, if we lock this to an integer the smooth motion of the graphs is lost entirely.

Re: Comment #74, this is an interesting test case, I wish I could test it myself. However I'd rather not push a hack around up into GNOME, the patch isn't useless of course it provides a valuable test case. 

We really should push to get the drivers fixed, or at least sane default configurations pushed into the distributions. So I have submitted this bug to fd.o hopefully it will be handled correctly from there.

https://bugs.freedesktop.org/show_bug.cgi?id=15479

Comment 77 Marcus Isaksson 2008-04-13 05:40:25 UTC

(In reply to comment #74)

It seems to improve the situation a little bit compared to just rounding the shift to integers. At least in the beginning, when the graph does not fill the full space. After a while though, when the graph fills the full space it actually seems slower.

My (very rough) estimates on Core Duo 1.8Ghz, maximized window (1440x900) in an unaccelerated situation (gsm+Xorg cpu usage):

           Original   Integral Shift   Jörgen's Patch
Upon Start:    55%        20%              15%
After a while: 55%        20%              30%

For a more reasonable window size, you can divide those figures with 3.

Comment 78 Marcus Isaksson 2008-04-13 05:48:11 UTC

(In reply to comment #76)
> Re: Comment #71, this patch is nonsense, the graph needs to move by a float

Did you actually try it and look at the graph? 

Running with 1 second updates and 5 frames per second I can't notice the difference. I can't make the window smaller than roughly 800 pixels wide, so it should shift by 800/62/5=2.58 pixels each time. Applying #71 makes it shift by 2 pixels sometimes and 3 pixels sometimes which looks just as smooth.

Of course it might not look as good on higher frame-rates. If that's the case, we could easily solve it by only rounding the shift if say,
(background width / number of samples) / number of frames per sample > 2.0

Comment 79 Karl Lattimer 2008-04-13 06:15:33 UTC

I honestly don't need to test it, I had the code in this configuration long ago... It's senseless, and adjusting the sample time or the number of frames per sample makes it behave strangely.

This may look just as smooth for 5fps on your system, which is exhibiting this bug? however on my system you have jitters every redraw, the difference between 2/3 pixels may not seem like much to you but when we have frames > width/samples oh look what happens! 

Its not hard to get there, believe me... 

On my iMac I regularly stress test this graph at 50-100 fps just to see how it effects performance... I can imagine what moving a minimum of 0px? or 1px would do to the graphs when testing at those kinds of frame rates.

Of course, the graph shouldn't really go up that high... Well maybe it should but that's a different bug entirely one related to consistent smoothing for wider graphs.

Comment 80 Alexander “weej” Jones 2008-04-13 14:38:41 UTC

If you really need to be able to move the line by non-integer pixel distances, then just draw the background from the buffer surface, and then draw directly on top of that with the line drawing code.

You simply can't cheat here and only draw the new parts of the line if you need the whole rest of the line to be shifted by non integer pixel amounts.

And if the performance of this is too low, you are just going to have to bite the bullet and drop the bezier stuff and see if that makes it any faster.

Doing image resampling of the line overlay, tens of times per second is gonna be bad news for performance /and/ appearance. You know that, Karl, I know how much you love your line hinting. :P

Comment 81 Michael Meeks 2008-04-14 09:18:21 UTC

I was profiling this locally vs. the 'radeonhd' driver; and it seemed to me that the primary problem was that we used XRender to blit the CAIRO_CONTENT_COLOR (good) surface onto the window. This seems to be down to the fact that the GdkWindow (on my machine at least) has an Alpha visual, such that

cr = gdk_cairo_create (widget->window);

has an alpha surface; and thus cairo-xlib-surface.c '_recategorize_composite_operation' returns DO_XRENDER instead of the (~always accelerated) DO_XCOPYAREA - which sucks. I guess we don't want an alpha visual for the window we're rendering into: but ... in my few minutes of poking I didn't find out how to achieve that.

Comment 82 Jörgen Scheibengruber 2008-04-14 12:12:13 UTC

Ok, I had a chance to test this on two systems:
a) my work laptop, a 2 year old ibm t43 using the radeon driver on ubuntu gutsy. just checked the xorg.conf, no options are set for the driver
b) my home laptop, a 5-6 year old noname (athlon MP), also using the radeon driver on ubuntu feisty. didn't check the xorg.conf, but i'm pretty sure there isn't anyting fancy in there either

the setting was always 1 sample/second

the results with trunk:
a) complete unusable, the cpu is at 100% and every frame takes several seconds to be drawn, the whole system (not only gsm) becomes sluggish
b) no impact on system, although gsm is a bit sluggish. cpu is around 60%, but drawing is mostly smooth

the results with toby's patch (using int instead of float):
a) like before, no noticeable change
b) cpu at around 30%, gsm sluggishness mostly gone

my patch:
a) cpu is at around 20% for around 800x1000 window. if i fullscreen the window to 1680x1050, it goes up a bit to %25
b) cpu is at around 20% for fullscreen window wich is (1024x768)

old releases:
a) gutsy has 2.20.1, the cpu is around 15% for the "normal window", 20% for fullscreen
b) the feisty had still the old non-cairo using graphs. I think the cpu for those was around 15% on my laptop

Comment 83 Benoît Dejean 2008-04-14 13:09:39 UTC

*** Bug 528026 has been marked as a duplicate of this bug. ***

Comment 84 Martin Ejdestig 2008-04-15 08:44:39 UTC

http://lists.freedesktop.org/archives/xorg/2008-April/034710.html

Comment 85 Toby Dacre 2008-04-15 12:57:13 UTC

I have tried Jörgen's patch from comment #74

For me it has made a large difference in performance.  Also previously my graph had a well defined sine wave (of g-s-m's update frequency) that has disappeared.  I can now run g-s-m full screen with 0.25 second updates and with an unlimited frame rate (ie not stuck on 5).  This is nice :)


However it is, as stated by the author a dirty patch.  But as a proof of concept it is compelling.  

The patch does need cleaning up.

 * Renaming tmp_cr -> cr causes the patch to be bigger than needed which makes it harder to review, even if it is a nicer name ;)

 * Additional fixes like stopping compiler warnings do not need to be in here.  They can be a separate patch.

Some more work needs doing

 * The graph line is drawn in the wrong place.  It needs to move down and to the right (hardly a show stopper)

 * we need to 'clip' the drawing of the 1st and last line segments so they don't get drawn outside the graph (a little calculating may be needed, to keep the curves just perfect here)

I'm happy to help out with this.  

What would be helpful is for people who are _not_ currently experiencing this problem to try the patch and see if it causes any regressions on their system(s).  If it does no harm then I'd like to see a cleaned up version of this patch considered.

Comment 86 Karl Lattimer 2008-04-15 13:29:17 UTC

Thanks for your brief review of this patch Toby, I will be working on this myself with the hopes of having a reliable fix pushed up this evening, cleaned up and working nicely.

Comment 87 Alex Shulgin 2008-04-15 18:49:59 UTC

(In reply to comment #74)
> 
> This is really just a quick'n'dirty test. Most notably it will break the
> graph-drawing a bit. But I see no sense in polishing it up, unless it actually
> fixes any issue ;-)
> 
> So if people that experience this bug could try this out?

Thanks, just tried the patch. :)

(In reply to comment #75)
> Yes, it fixes the terrible performance now. CPU usage still pretty high,
> though.

Same thing here.  The CPU usage starts with 25% and then slowly grows up to 40%.

--
Regards,
Alex

Comment 88 Samuel Buffet 2008-04-16 19:25:24 UTC

Created attachment 109383 [details]
notgood

I am not sure that can help but as I've seen no description of what I have on my desktop in the thread, see attached pix

Look at the CPU, the colors and the xxx on the left!

I you had already those informations  -> Sorry for pollution.

Comment 89 Samuel Buffet 2008-04-16 19:26:05 UTC

Created attachment 109384 [details]
notgoodatall

Comment 90 Benoît Dejean 2008-04-17 08:19:29 UTC

(In reply to comment #89)
> Created an attachment (id=109384) [edit]
> notgoodatall
> 

This is bug http://bugzilla.gnome.org/show_bug.cgi?id=521930 which i've updated.

Comment 91 Karl Lattimer 2008-04-17 09:57:52 UTC

Benoit, these look like the old graphs, not the new ones, with the new colour buttons?! 

I'm still trying to find the time to fix up this bug using Jorgen's patch, and hopefully I'll be able to fix the problem as much as it can be without driver fixes. 

wrt the locale issue, I thought we had already pushed up all the strings for translation and there weren't problems.

Also I don't quite understand the issue with the numbers...

Comment 92 Samuel Buffet 2008-04-17 11:55:12 UTC

Created attachment 109414 [details]
trunk

(In reply to comment #91)
> Benoit, these look like the old graphs, not the new ones, with the new colour
> buttons?! .. 
> 

Same result when compiled manually 2.23.0 from svn trunk rev 2386 (asked for new gnome-common also)

No Compiz
NVidia 8500GT  (without nonfree drivers)

I have effects of the 2 bugs (507797 & 521930) at the same time. CPU increases (one core goes to 100%) and the *black boxes*. I obtain the same result changing height or width of the main window.

Comment 93 Karl Lattimer 2008-04-17 12:19:21 UTC

So you're saying that in trunk the black graphs are back?

Benoit do you know anything about this regression? I mean if I'm going to go in and fix the graph drawing using Jorgens suggested method then I'd like to know if someone else has committed to the trunk with something that's unclean. I for one can assure you that I haven't worked on this code since I left england aprx 1 month ago.

Comment 94 Benoît Dejean 2008-04-17 13:04:57 UTC

(In reply to comment #93)
> So you're saying that in trunk the black graphs are back?
> 
> Benoit do you know anything about this regression? I mean if I'm going to go in
> and fix the graph drawing using Jorgens suggested method then I'd like to know
> if someone else has committed to the trunk with something that's unclean. I for
> one can assure you that I haven't worked on this code since I left england aprx
> 1 month ago.
> 

Nop, it's the first time black graphs are reported. Please make sure to manually the line colors. I still don't quite understand the issue and don't whether the cure is better than the problem.

Comment 95 Karl Lattimer 2008-04-17 13:20:23 UTC

Benoit: Well the black graphs need to go at the very least...

I will investigate the regression and see if I can attribute blame if I can indeed reproduce the black graph problem once again. 

Once I've solved that I'll take a look into the patch provided by jorgen and start cleaning it up.

Jorgen: If you could advise what _YOU_ think should be done to clean this patch up then that would be great, Toby's review is enough for me to get started but I would like your thoughts on this issue. 

Hopefully 1hr of hacking and we'll have a viable fix for 2.22.1

Comment 96 Samuel Buffet 2008-04-17 16:38:08 UTC

(In reply to comment #93)
> So you're saying that in trunk the black graphs are back?
> 
> Benoit do you know anything about this regression? I mean if I'm going to go in
> and fix the graph drawing using Jorgens suggested method then I'd like to know
> if someone else has committed to the trunk with something that's unclean. I for
> one can assure you that I haven't worked on this code since I left england aprx
> 1 month ago.
> 

Well NO,  my mistake (shame on me) the truck is OK and everything is fine now!

Thanks to the team for that fix.

Comment 97 Karl Lattimer 2008-04-17 19:12:50 UTC

Created attachment 109450 [details] [review]
slightly improved version of jorgen's patch

Here's a new version of Jorgen's patch against trunk, here I've tidied up the drawing and clipping to fit within the bounding box correctly. Please review and identify any issues. I haven't done much to the original patch, other than removing the commented sections, reformatted the indentation added the clipping and tweaked the positioning.

Comment 98 Toby Dacre 2008-04-17 21:21:18 UTC

Karl,

The patch compiles and improves performance :)  
The 'clipping' is also fixed

thoughts...

patch line 316 assumes a fixed font width
+        x_offset = 20-tmp;
should be
+       x_offset = g->rmargin - tmp;

I've noticed that the line goes just above the 100% line (when compiling etc).  I think it's nice if it stay just inside (like it does at the bottom).  Here's how I did that.  I've changed the indent to make this readable

--- patch      (Karl)
+++ patch      (Toby)
@@ -233,6 +233,7 @@
 
 x_offset = g->rmargin - tmp;
 
+guint draw_height = g->real_draw_height - 3;
 
 for (j = 0; j < g->n; ++j) {
        cairo_move_to (cr,
@@ -245,11 +246,11 @@
                continue;
        cairo_curve_to (cr, 
               x_offset + (g->draw_width - (i-1) * g->graph_delx) - (g->graph_delx/2),
-              (1.0f - g->data[i-1][j]) * g->real_draw_height+3,
+              (1.0f - g->data[i-1][j]) * draw_height + 6,
               x_offset + (g->draw_width - i * g->graph_delx) + (g->graph_delx/2),
-              (1.0f - g->data[i][j]) * g->real_draw_height+3,
+              (1.0f - g->data[i][j]) * draw_height + 6,
               x_offset + g->draw_width - i * g->graph_delx,
-              (1.0f - g->data[i][j]) * g->real_draw_height+3);
+              (1.0f - g->data[i][j]) * draw_height + 6);
 }
 
 cairo_stroke (cr);


bonus points: remove the now redundant/misleading comment in load-graph.cpp
/* Redraws the backing buffer for the load graph and updates the window */

Comment 99 Toby Dacre 2008-04-17 22:09:31 UTC

some other bits

this seems to be unwanted
-	context.set_summary(_("A simple process and system monitor."));
+	//context.set_summary(_("A simple process and system monitor."));

the frames per unit is locked to 5 by this patch but we can unpick that later

Comment 100 Karl Lattimer 2008-04-18 04:56:09 UTC

TESTERS TESTERS TESTERS!!!

Anyone still suffering from this problem please test it and report back whether or not you are still experiencing difficulty.

Benoit: is comment #99 valid? Is this unwanted? Also, is there any chance we can get this branched as 2.22.1 soon?

Toby: All good points in comment #98, as you can probably see I made a guess at the units to offset, would you like to fix up the original patch and provide a third patch, while you're at it, two other things need to be done; 1) Formatting needs to be fixed here and there, change space indents to tab indents and 2) run it past valgrind to test for any memory leaks, I couldn't see any but there may be the odd one.

Jorgen: Don't forget I owe you a beer for this :) Are you coming to Guadec? Get yourself on the list, speak to Quim!

Comment 101 Karl Lattimer 2008-04-18 05:02:22 UTC

** Important note ** 

One day all of this work will be obsolete, its a shame we've had to create a workaround to this problem because of some bad driver/chip combinations fd.o are aware of the problem and hopefully X.org will be fixed up in the future, as mentioned here http://lists.freedesktop.org/archives/xorg/2008-April/034710.html there is already a fix available for ATI cards, i don't think the nvidia bug is common, but the intel one might be. 

One day there'll be a fix for those too, and hopefully some day soon hardware accelerated cairo will make all of this a thing of the past and give us kickass frame rates.

Comment 102 Alex Shulgin 2008-04-18 06:32:55 UTC

(In reply to comment #100)
> TESTERS TESTERS TESTERS!!!
> 
> Anyone still suffering from this problem please test it and report back whether
> or not you are still experiencing difficulty.

Yes, it fixes the problem on my box.

Now the graphs are eating ~40% of CPU time (at most--70%, when maximized), but the system is still pretty usable.  These figures are for the 0.25 sec refresh rate--for 1 sec both are below 20%.

Thanks!

--
Alex

Comment 103 Benoît Dejean 2008-04-18 06:59:15 UTC

(In reply to comment #100)
> TESTERS TESTERS TESTERS!!!
> 
> Anyone still suffering from this problem please test it and report back whether
> or not you are still experiencing difficulty.
> 
> Benoit: is comment #99 valid? Is this unwanted? Also, is there any chance we
> can get this branched as 2.22.1 soon?
> 

IIRC that GOptions part was useless.

I'm think we should first patch trunk so we get more testers and then later merge into branches/gnome-2-22.

Comment 104 Benoît Dejean 2008-04-18 07:11:56 UTC

(In reply to comment #102)
> (In reply to comment #100)
> > TESTERS TESTERS TESTERS!!!
> > 
> > Anyone still suffering from this problem please test it and report back whether
> > or not you are still experiencing difficulty.
> 
> Yes, it fixes the problem on my box.
> 
> Now the graphs are eating ~40% of CPU time (at most--70%, when maximized), but
> the system is still pretty usable.  These figures are for the 0.25 sec refresh
> rate--for 1 sec both are below 20%.

system-monitor or X ?

Comment 105 Alex Shulgin 2008-04-18 07:16:09 UTC

> > 
> > Yes, it fixes the problem on my box.
> > 
> > Now the graphs are eating ~40% of CPU time (at most--70%, when maximized), but
> > the system is still pretty usable.  These figures are for the 0.25 sec refresh
> > rate--for 1 sec both are below 20%.
> 
> system-monitor or X ?

X, surely.

--
Alex

Comment 106 Benoît Dejean 2008-04-18 07:21:12 UTC

(In reply to comment #97)
> Created an attachment (id=109450) [edit]
> slightly improved version of jorgen's patch
> 
> Here's a new version of Jorgen's patch against trunk, here I've tidied up the
> drawing and clipping to fit within the bounding box correctly. Please review
> and identify any issues. I haven't done much to the original patch, other than
> removing the commented sections, reformatted the indentation added the clipping
> and tweaked the positioning.
> 

-	unsigned pow2 = std::floor(log2(new_max));
-	unsigned base10 = pow2 / 10;
-	unsigned coef10 = std::ceil(new_max / double(1UL << (base10 * 10)));
+	unsigned pow2 = (unsigned)std::floor(log2(new_max));
+	unsigned base10 = (unsigned)(pow2 / 10);
+	unsigned coef10 = (unsigned)std::ceil(new_max / double(1UL << (base10 * 10)));
 	g_assert(new_max <= (coef10 * (1UL << (base10 * 10))));
 
 	// then decompose coef10 = x * 10**factor10
 	// where factor10 is integer and x < 10
 	// so we new_max has only 1 significant digit
 
-	unsigned factor10 = std::pow(10.0, std::floor(std::log10(coef10)));
-	coef10 = std::ceil(coef10 / double(factor10)) * factor10;
+	unsigned factor10 = (unsigned)std::pow(10.0, std::floor(std::log10(coef10)));
+	coef10 = (unsigned)std::ceil(coef10 / double(factor10)) * factor10;


please don't touch unrelated parts (and casts are useless there)

oh, and i don't understand why you've mixed in a change about the update timers (from 3 to 1, which may render the UI bursty since there won't be any UI update during the whole graph update).

Comment 107 Karl Lattimer 2008-04-18 07:33:21 UTC

This was included from Jorgen's patch, I noticed it also but we can strip it out.

Comment 108 Michael Meeks 2008-04-18 10:02:44 UTC

Yes, this fixes performance perfectly for me; a few % to render the graphs even full-screen, and they still look beautiful. Many thanks.

Comment 109 Karl Lattimer 2008-04-18 12:36:01 UTC

Created attachment 109482 [details] [review]
much improved patch

Further improved patch, which is self contained in load-graph.cpp, fixed timing issues and repaired various niggles in the previous patch.

Comment 110 Toby Dacre 2008-04-18 14:08:09 UTC

Created attachment 109488 [details] [review]
another alternative

Karl,

I've been playing and this is my minimised patch based on your #97 patch.

I'd found/fixed some issues with;

timings
scale changes
fixed frames_per_unit being used

I also made it a small readable as I could
hey duplication is fun ;)

Comment 111 Jörgen Scheibengruber 2008-04-18 20:46:38 UTC

> -       unsigned factor10 = std::pow(10.0, std::floor(std::log10(coef10)));
> -       coef10 = std::ceil(coef10 / double(factor10)) * factor10;
> +       unsigned factor10 = (unsigned)std::pow(10.0,
> std::floor(std::log10(coef10)));
> +       coef10 = (unsigned)std::ceil(coef10 / double(factor10)) * factor10;
> 
> 
> please don't touch unrelated parts (and casts are useless there)
>
> oh, and i don't understand why you've mixed in a change about the update timers
> (from 3 to 1, which may render the UI bursty since there won't be any UI update
> during the whole graph update).
 
well as I said, my patch was not cleaned up and should be probably split.

regarding the casts: either these are possible bugs (double does not necessarily fit unsigned int), or the compiler should be made to shut up, but that's just my opinion

regarding the 3 updates into one: it lookes better imho, if all the graphs are in sync. with 3 timeouts the 2nd and 3rd are limping behind. probably pretty subtle issue, but it gives a bad overall impression. if you are concerned about ui responsiveness then you should consider the rendering of the graphs as a whole, whether the graphs are rendered in one swap or in 3 makes little difference

Comment 112 Karl Lattimer 2008-04-19 07:34:17 UTC

Created attachment 109518 [details] [review]
merged patches of mine and tobys

I've reviewed both patches, compared them and determined that they are almost identical, Toby had some minor details above mine, and mine had some minor changes over his, nothing major but merged the patches together... Now we have the super patch. 

This needs reviewing, but that shouldn't take too long... So I'll be ready to commit once I get the commit-now go ahead from Benoit...

Comment 113 Toby Dacre 2008-04-19 10:46:06 UTC

I've tested this latest patch it all appears to work as expected :)

There are some regressions as far as a variable frame rate is concerned but as this is not currently enabled I do not think this is a big problem.  When this feature is enabled it should be easy to get it working.  For the current build it is unneeded code.

I am happy for this patch to be committed.

Comment 114 Toby Dacre 2008-04-19 11:03:25 UTC

I have found one issue

patch line 318

+       x_offset + (g->draw_width - (i-1) * g->graph_delx) - (g->graph_delx/2),
+       (1.0f - g->data[i-1][j]) * g->real_draw_height+3.5,
+       x_offset + (g->draw_width - i * g->graph_delx) + (g->graph_delx/2),
+       (1.0f - g->data[i][j]) * g->real_draw_height+3.5,
+       x_offset + g->draw_width - i * g->graph_delx,
+       (1.0f - g->data[i][j]) * g->real_draw_height+3.5);


should be

+       x_offset + (g->draw_width - FRAME_WIDTH - (i-1) * g->graph_delx) - (g->graph_delx/2),
+       (1.0f - g->data[i-1][j]) * g->real_draw_height+3.5,
+       x_offset + (g->draw_width - FRAME_WIDTH - i * g->graph_delx) + (g->graph_delx/2),
+       (1.0f - g->data[i][j]) * g->real_draw_height+3.5,
+       x_offset + g->draw_width - FRAME_WIDTH - i * g->graph_delx,
+       (1.0f - g->data[i][j]) * g->real_draw_height+3.5);

Without the FRAME_WIDTH the graph doesn't quite get to the left hand edge.

Comment 115 Karl Lattimer 2008-04-19 11:26:32 UTC

actually toby it should be; 

(float)(g->draw_width - g->rmargin - g->indent) / (float)LoadGraph::NUM_POINTS

not frame width, as it is simply too far to the right by one point width. 

I've refactored this code nicely so it's easier to read and will upload the patch in a moment.

Comment 116 Karl Lattimer 2008-04-19 11:32:16 UTC

Created attachment 109528 [details] [review]
added improved graph line position calculation

Changed the majority of the code for calculation of the graph offset to allow the graph to reach the left hand edge, and take some calculations out of the double for loop.

Comment 117 Toby Dacre 2008-04-19 14:01:14 UTC

hmm,

Almost there!  I now have an issue with the line not getting quite the right but only noticeable with full screen.

Comment 118 Karl Lattimer 2008-04-19 14:28:36 UTC

Created attachment 109542 [details] [review]
Fixed position... again!

Looks like when I rejiggered the position calculations I missed something, it should be increased by half a sample width now... Tested and works perfectly, even the nasty jumpy bit is gone...

tip: turn off the clipping line if this goes hairy again.

Comment 119 Toby Dacre 2008-04-19 15:52:26 UTC

Karl,

I'm back to the previous situation with the line not getting all the way to the left.  This only happens when the graph is at the minimum width.

I think this needs some further work as there are things like NUM_POINTS including the 2 'click' buffer.  I'm not sure that this is always taken account of.

Comment 120 Karl Lattimer 2008-04-19 16:26:09 UTC

Created attachment 109547 [details] [review]
and again!

of course toby!

But its not the NUM_POINTS i wasn't taking into account, it was the rmargin. I though that because it was too high I should use half the sample width, this was dumb! I needed to subtract rmargin, then I realise that the difference is exactly the width (or 2px shy of) double the sample_width... So I was being a bit dumb, and I only checked it full screen because I didn't spot the obvious...

Never mind, this is proper now :)

Comment 121 Toby Dacre 2008-04-19 16:48:21 UTC

maybe turn the clipping back on ;)

	cairo_rectangle (cr, g->rmargin + g->indent + FRAME_WIDTH + 1, FRAME_WIDTH - 1,
			 g->draw_width - g->rmargin - g->indent - 1, g->real_draw_height + FRAME_WIDTH - 1);
	cairo_clip(cr);

I'm happy, good work!

Comment 122 Karl Lattimer 2008-04-19 17:52:35 UTC

good catch toby :) 

FIXED, Commited rev 2388

Comment 123 Alex Shulgin 2008-04-19 18:23:48 UTC

(In reply to comment #122)
> good catch toby :) 
> 
> FIXED, Commited rev 2388

Nice job, thanks! :)

Just a minor issue: it now constantly fills up stdout with the following lines on my box:

556.645161, 8.322580
556.645161, 8.322580
556.645161, 8.322580
...

This is most probably caused by line #219 in load-graph.cpp:

g_print ("%0f, %0f \n", x_offset, sample_width);

--
Regards,
Alex

Comment 124 Karl Lattimer 2008-04-19 18:34:39 UTC

Nice catch, removed from trunk

Comment 125 Jakob Unterwurzacher 2008-05-13 19:05:35 UTC

Ubuntu now ships this patch - it is a good improvement.

However, for many people (me f.x.), the graphs are still too CPU-heavy to be actually useful for CPU monitoring. (Measuring alters the outcome significantly)

Please see those examples from https://bugs.launchpad.net/ubuntu/+source/gnome-system-monitor/+bug/187383 : 
* Athlon XP 2500+
  - maximized window @1600x1200:
    90% CPU load
    http://launchpadlibrarian.net/14431401/Fullscreen_systemmonitor_with_fix.png
  - normal window size:
    30% CPU load
    http://launchpadlibrarian.net/14431521/Normal_sized_systemmonitor_with_fix.png
* Core 2 Duo @1Ghz (SpeedStep)
  - maximized window @1280x800:
    15% on each core (that makes 30 or something)
    http://launchpadlibrarian.net/14403113/gsm-hardy-proposed.png
  - normal window size (800x600)
    10% on each core (that makes 20)

Functionality should be the first priority over eye-candy, i think there should be an option to disable smooth scrolling (or cubic bsplines or whatever is most CPU-hungry and the easiest to implement).

Comment 126 Benoît Dejean 2008-05-13 19:34:05 UTC

Then why don't you report these bugs to the driver writers ? system-monitor runs the graphs at less than 10% of CPU usage on my 500MHz ibook with radeon.

Because i really don't see how we could do less with cairo: it's just a basic line with 100 points and a background. I'm sorry but with either have to blame X/drivers or drop cairo rendering.

Comment 127 Jakob Unterwurzacher 2008-05-13 19:46:09 UTC

Reporting to the driver writers is a good idea, of course.

Wouldn't redrawing only once per sampling time help?

Comment 128 Benoît Dejean 2008-05-13 19:58:20 UTC

Well, you reported that 20% on a core2duo is excessive. If we go from 5 fps to 1 fps, the Althon would still run at 20%. That would still be far too much on a computer that can run quake 3 at more than 100fps.

Sorry, i don't want to be rude, but on one hand there are users who complain about the jumping graphs and on the other hands one who reports very high cpu usage.

nb: i don't think this is related to the bézier rendering. when we switched to it, nobody reported any performance problem.

Comment 129 Jakob Unterwurzacher 2008-05-13 20:20:50 UTC

> Sorry, i don't want to be rude, but on one hand there are users who complain
> about the jumping graphs and on the other hands one who reports very high cpu
> usage.

That's why i would make smooth scrolling optional (default on).
"View -> Smooth Scrolling" or something like that.

> Well, you reported that 20% on a core2duo is excessive. If we go from 5 fps to
> 1 fps, the Althon would still run at 20%. That would still be far too much on a
> computer that can run quake 3 at more than 100fps.

Going from 5 FPS to 1 (optionally!) would be a 5-fold improvement. Would help a lot IMO.
That a scrolling graph uses 100% CPU on a PC that can run quake is ridiculos of course. But solving the underlying problems will take some time i guess. And there are still the users who don't have a decent graphics card.

Comment 130 Vincent Trouilliez 2008-05-14 15:24:35 UTC

I am hit by this too. Drawing graphs smotthly and fast should not take more than 1% of a CPU whatever the CPU, because it's just not justifiable :-/
If Cairo is the problem then dropping Cairo is the way to go I guess...
There has to be other ways/functions/l:ibraries to draw graphs fast and smoothly without using any significant/noticable CPU resources. I mean Cairo is fairly new, but I guess it's been years 10+ years since some linux application or another, had to draw graphs... let's just see how they did it, and do the same ! :-/
As a start, what springs to mind, is the "visualization" plug-ins that are so common in music players: they can draw an "oscilloscope" like graph, very similar to the CPU graphs in g-s-m me think, perfectly smoothly and very fast, yet I don't think they use 60% of the CPU like g-s-m does on my machine.

So why not take a look at the code that XMMS and friends use to draw these graphs ?

Don't jump on me, am just trying to be useful somehow ;-)

Comment 131 Benoît Dejean 2008-05-14 17:49:47 UTC

(In reply to comment #130)
> 
> So why not take a look at the code that XMMS and friends use to draw these
> graphs ?
> 
> Don't jump on me, am just trying to be useful somehow ;-)
> 

Maybe it would be more useful to write a benchmark and start to point at nasty drivers. I have no idea about how much people have this bug, the only hardware n which i could reproduce was actually a vmware...

Comment 132 Karl Lattimer 2008-05-14 18:29:39 UTC

Just one comment to make because I'm too busy to reply with all my whits.

I am currently using vesa driver for X because there is no fglrx on fedora9 yet.

This uses ~1% CPU all of the time, the fglrx driver uses about 15% with this current graph drawing method.

The vesa driver uses no hardware acceleration so therefore something is very very ill with other drivers.

That is all. Will reply in detail later.

Comment 133 Jakob Unterwurzacher 2008-05-14 21:06:05 UTC

@Benchmark: We should probably agree on a windows size at least. 800x600 seems reasonable to me.

@Karl and buggy drivers:
1%, wow.
BTW: This (22% CPU)
http://launchpadlibrarian.net/14501052/gsm_800x600.png
is with nvidia-glx-new 169.12+2.6.24.12-16.34 . And compiz, if that matters.

Comment 134 Michael Meeks 2008-05-15 11:48:28 UTC

Presumably if we sacrificed things like "grid stays still, while lines move" and scrolled the grid with the nice graph - we could pull all the old tricks of moving windows under windows, pure XCopyArea acceleration goodness & so on - while keeping the nice spline / anti-aliased rendering etc.

Of course that may suck in other ways ;-) but it's notable that Windows (for all it's ugly, flashing, jerky 'CPU Usage History' scrolls the grid-lines as well as the graph.

Comment 135 Jakob Unterwurzacher 2008-05-15 17:42:05 UTC

Lines that mark fixed points in time seem more useful to me anyway. But then the time-caption (19:20:00 ... 19:20:10 ... 19:20:20 ... etc ) would have to scroll, too.

Comment 136 Michael Meeks 2008-05-15 19:51:57 UTC

Sure - agreed; but there is quite a strong visual cue as to the scale on the time-axis if you watch the graph for a few seconds (as the user no doubt will ;-) and doing this would nail ~all the performance problems at a stroke - while keeping the smooth scrolling (I suspect).

Finally - out of interest - why is the top-level window for the system monitor a 32bit depth window [ presumably this is also not helpful performance wise ] - the vast majority of apps on my desktop are 24bit (according to xwininfo).

Comment 137 Benoît Dejean 2008-06-30 21:52:43 UTC

*** Bug 540542 has been marked as a duplicate of this bug. ***