Bug 410534 – Slow content scrolling, takes 100% of CPU.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 410534 - Slow content scrolling, takes 100% of CPU.


Summary:	Slow content scrolling, takes 100% of CPU.


Status:	RESOLVED FIXED

Product:	vte
Classification:	Core
Component:	general
Version:	0.15.x
Hardware:	Other Linux

Importance:	High major
Target Milestone:	---
Assigned To:	VTE Maintainers
QA Contact:	VTE Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2007-02-21 21:10 UTC by Karol Bryd
Modified:	2007-03-01 00:32 UTC

See Also:
GNOME target:	---
GNOME version:	2.17/2.18

Attachments
sysprof screenshot of 0.14.1 libvte - this version works fine. (119.58 KB, image/png) 2007-02-21 21:13 UTC, Karol Bryd		Details
sysprof screenshot of 0.15.3 libvte (111.37 KB, image/png) 2007-02-21 21:13 UTC, Karol Bryd		Details
Comment out all vte_xft_clip code... (1.01 KB, patch) 2007-02-28 00:18 UTC, Mike Auty	none	Details \| Review
Causes the scrolling slow down (31.18 KB, text/plain) 2007-02-28 21:53 UTC, Mike Auty		Details

Description Karol Bryd 2007-02-21 21:10:51 UTC

I have noticed that in 0.15.3 scrolling gnome-terminal's content using mouse is very slow and what's worse uses 100% of CPU. After downgrading to 0.14.1 everything is fine...

Gnome-terminal version is 2.17.91, distribution is Ubuntu Feisty, I am using ATI closed source video drivers, direct rendering is enabled, all other applications are working fine.

Comment 1 Karol Bryd 2007-02-21 21:13:08 UTC

Created attachment 83066 [details]
sysprof screenshot of 0.14.1 libvte - this version works fine.

Comment 2 Karol Bryd 2007-02-21 21:13:41 UTC

Created attachment 83067 [details]
sysprof screenshot of 0.15.3 libvte

Comment 3 Chris Wilson 2007-02-21 21:46:14 UTC

Karol, you can save the profiles either from the File menu or from the SaveAs toolbar button.

Could you describe your testing methodolgy as all it immediately seems to be is the change between 0.14 and 0.15 to reduce the number of lines that the mouse wheel scrolls by.

Comment 4 Karol Bryd 2007-02-21 22:10:59 UTC

As of methodology....there is none :) I have simply noticed that is works noticeably slower. 

For example, if I scroll terminal window's content CPU usage jumps to about 100%. In 0.14.x CPU usage is about 16%. These numbers are from htop which was run in one terminal, the other one contained output from /var/log/messages and window was scrolled using mouse wheel.

Another example. Gnome-terminal with running fgl_glxgears, when fgl_glxgears outputs current FPS rate then rendering stops for a fraction of second, nothing similar happens in 0.14.x.

Comment 5 Chris Wilson 2007-02-21 23:36:00 UTC

I've reviewed the mouse scrolling code and tweaked HEAD slightly.

r1716: 2007-02-21  Chris Wilson  <chris@chris-wilson.co.uk>

	cf Bug 410534 – Slow content scrolling, takes 100% of CPU.

	* src/vte.c: (vte_terminal_scroll),
	(vte_terminal_set_scrollback_lines):
		Operate on scroll delta directly as adjustment->value updates
		are not instantaneous and we may have several scroll events
		before the next update.


However, judging from your comments, I think there may be a severe renderering performance regression wrt 0.14.x. Could you please profile HEAD and see if the situation has improved? Thanks.

Comment 6 Mike Auty 2007-02-26 09:19:25 UTC

I also just built 0.15.3 (and have a radeon using fglrx with DRI acceleration enabled) and am also experiencing huge delays/CPU usage when scrolling.  The scrolling doesn't have to be using the mousewheel, just dragging the scrollbar (or having a program that regularly outputs to stdout) causes the extreme slow downs. 

It seems mostly to occur after the scrollback buffer has filled to a sufficient size.  So if there is only a small amount that can be scrolled it's all pretty fast no matter how far you want to go, however if the buffer is filled with text even moving up a couple of lines seems to exhibit the symptoms.  I haven't yet been able to try out HEAD, but I'll see if I can get it built soon and test it out...

Comment 7 Chris Wilson 2007-02-26 09:25:18 UTC

Mike, Karol - any insights you can provide today would be most helpful. We're due to make the next release at midnight and I'd like to fix this regression before then :-)

Thanks.

Comment 8 Karol Bryd 2007-02-26 10:19:13 UTC

I had no time yet to test the HEAD, but at least managed to build it...

But I have one more test for you:

Put some files with long names (at least 80 characters) in a folder and list them "ls -al", then try scrolling the window content. I have noticed that when it comes to scroll these long file names then CPU usage jumps to 90%, when there isn't much text in a line then it works acceptably (CPU usage is about 30%).

I can't test now with 0.14.x version (will do it after work) but I am pretty sure that there was no such slowdown...

Comment 9 Mike Auty 2007-02-26 11:03:41 UTC

That would fit with the few tests I've managed.  Trying to compile something (and thus having the extremely long output lines) ran really slowly, whereas compilation on vte 0.14 was fine.  I'll have more of a chance to test this evening once I'm back from work...

Comment 10 Chris Wilson 2007-02-26 16:28:03 UTC

For testing, I did 
$ seq 1 1e7 | tr \\n \  > long-lines.txt; time cat long-lines.txt

Apart from the unfortunate stutter where we tried to read at ~80MiB/s from the child process but could only process about ~5MiB/s, there was no apparent slow down. Both vte-0.14 and HEAD processed consumed similar amounts of CPU time, except that HEAD completed in less than half the time of vte-0.14.

After imposing a heuristic to prevent processing overruns (aiming to keep the refresh and scrolling smooth), there is little to distinguish the behaviour of the two versions on my boxen.

Similary with manually scrolling with long lines in the buffer.

Can you try the test above to check that it does indeed show a difference between vte-0.14 and vte-0.15? [i.e. that it triggers this regression! ;-]

Comment 11 Mike Auty 2007-02-26 22:33:56 UTC

I can confirm that the test line you provided triggers the regression.

The first run (vte-0.14.2) took about 47 seconds.  

The second run (vte-0.15.3) took 7 minutes 20 seconds
(all in real: user 0m0.007s, sys 0m0.707s).

I haven't yet had a chance to test HEAD yet.

Comment 12 Mike Auty 2007-02-26 23:10:20 UTC

Sadly, it looks like we missed the last check-in.  About an hour ago 0.15.4 was tagged in SVN, and that version took 10 minutes and 2 seconds, so it looks like the regression hasn't been fixed.

On another box, this time with an nvidia card (with DRI), 0.14.2 took 48 seconds, whilst 0.15.3 took 44 seconds, so it seems to be rather specific to the fglrx driver, if that helps figure out the problem?

Comment 13 Chris Wilson 2007-02-26 23:18:31 UTC

Ouch, an order of magnitude slower! I think I may have upset the fglrx driver...

For comparison, can you try the same long-lines test but launch g-t with:
VTE_BACKEND=ft2 gnome-terminal --disable-factory
which tells vte to use the FreeType2 backend and not libXft. Double-checking against vte-0.14 would be useful, except you need to use:
VTE_USE_XFT=0 gnome-terminal --disable-factory
instead.

Along that path my prime suspect is the introduction of using clipping alongside libXft. You can check this by commenting out the contents of vtexft.c::_vte_xft_clip().


Hmm, don't have access to a radeon+fglrx here (the only radeon I have is on a ppc) so I'm going to need plenty of feedback to find an acceptable workaround.

Comment 14 Mike Auty 2007-02-26 23:40:02 UTC

Yep, running those commands seemed to make the difference.

0.15.4 with VTE_BACKEND=ft2 took 36 seconds.

0.15.4 without VTE_BACKEND=ft2 took 8 minutes 48 on the second run.

0.14.2 with VTE_USE_XFT=0 took 50 seconds.

I'm afraid I can't test much more tonight.  I'll try commenting out the xft_clip code tomorrow sometime...

Comment 15 André Klapper 2007-02-27 00:34:32 UTC

uhm, this could become a potential release blocker. :-(

Comment 16 Chris Wilson 2007-02-27 17:30:48 UTC

Mike (or anyone in a position to test ;), Behdad wishes to make a new release shortly, primarily to push out the fix for bug 412562. Could you endeavour to see if the slow down is solely due to the _vte_xft_clip()? If so, we can then include the work-around and hopefully have a happy user base in time for gnome-2.18. :-)

Comment 17 Mike Auty 2007-02-28 00:18:52 UTC

Created attachment 83507 [details] [review]
Comment out all vte_xft_clip code...

Ok, please see the attached patch, which simply comments out the _vte_xft_clip() function.  Sadly it appears only to reduce the problem very slightly, but still doesn't fix it by a long shot (and at each version bump, the figures seem to be going up?).

0.15.5  XFT - No patch - 14m5.999s
0.15.5  XFT - Patch - 13m30.932s
0.15.5  FT2 - Patch - 0m36.597s

I'll be quite happy to test any patches you may have, or if there's any other tests I can try please let me know.  I'd also like to see this hit on it's head (if it's of any use, I tried a very slow machine with an i810 card and DRI, and the 0.15.3 was faster than 0.14.2)...

Comment 18 Chris Wilson 2007-02-28 10:02:39 UTC

Thanks Mike for running that test. I really hoped for a nice simple explanation!

Another major change was the increasing of max number of glyphs passed to libXft, can you try reducing it?


Index: src/vtexft.c
===================================================================
--- src/vtexft.c        (revision 1775)
+++ src/vtexft.c        (working copy)
@@ -41,6 +41,9 @@
 
 #define DPY_FUDGE 1
 
+#undef VTE_DRAW_MAX_LENGTH
+#define VTE_DRAW_MAX_LENGTH 80
+
 struct _vte_xft_font {
        guint ref;
        Display *display;

Comment 19 Mike Auty 2007-02-28 10:14:36 UTC

I think that might've been it!  Applying the above patch to 0.15.5 gave me the following:

real    0m34.128s

Which seems to have cured it!  5:)

Comment 20 Chris Wilson 2007-02-28 10:19:56 UTC

* swears under his breath.

Ok, now that I know what the cause was, time for a bit of googling.
Normal service will be resumed shortly ;-)

Comment 21 Chris Wilson 2007-02-28 10:41:44 UTC

Bah, google revealed nothing!
Mike, can you try different max lengths to find the turning point in the curve? Thanks.

Comment 22 Mike Auty 2007-02-28 12:49:35 UTC

Right, here's some preliminary results.  It seems there's very little impact up until about 320 after which it starts taking off.  Given that it seems to be such a sharp increase, I'm wondering if it's some kind of memory exhaustion and therefore might be different for people with different amounts of video memory or slightly different hardware?  These figures could do with testing by other people, but for me, I'd say 320-340 was reasonable.

VTE_DRAW_MAX_LENGTH = 80        real    0m33.4s
VTE_DRAW_MAX_LENGTH = 160       real    0m33.4s
VTE_DRAW_MAX_LENGTH = 320       real    0m33.4s
VTE_DRAW_MAX_LENGTH = 340       real    0m33.5s
VTE_DRAW_MAX_LENGTH = 350       real    0m33.9s
VTE_DRAW_MAX_LENGTH = 360       real    0m35.8s
VTE_DRAW_MAX_LENGTH = 370       real    0m40.9s
VTE_DRAW_MAX_LENGTH = 380       real    0m57.5s
VTE_DRAW_MAX_LENGTH = 400       real    3m18.1s
VTE_DRAW_MAX_LENGTH = 480       real    Too long
VTE_DRAW_MAX_LENGTH = 540       real    Whay too long

Above 340 it felt as though the numbers were sluggish up to 1e1, and afterwards went up at a reasonable speed.  At 340 and less the speed seemed pretty constant throughout, although maybe that's just my eyes.  5;)  Hope this helps!

Comment 23 Chris Wilson 2007-02-28 13:08:01 UTC

Thanks Mike for the analysis.

r1777: 2007-02-28  Chris Wilson  <chris@chris-wilson.co.uk>

	Bug 410534 – Slow content scrolling, takes 100% of CPU.

	Submitting long glyph runs was causing a dramatic (10x) slow down
	in the fglrx xserver.

	* src/vtedraw.h 
	* src/vtexft.c	(_vte_xft_draw_text):
		Cap the max glyph run length to 300.

Comment 24 Behdad Esfahbod 2007-02-28 21:16:14 UTC

With not much reason, I'd go for 240...  (as in 256 - 16).

Comment 25 Mike Auty 2007-02-28 21:22:40 UTC

I hate to say it, but I also found that at 320 (which is what I left my final patch value at), it could still be a little sluggish dragging the scrollbar on a long page of compilation results (rather a common occurance on Gentoo!), but reducing it down to 160 left the scrolling extremely responsive.  I'll give 240 a go and report back the results...

Comment 26 Mike Auty 2007-02-28 21:53:31 UTC

Created attachment 83591 [details]
Causes the scrolling slow down

Sadly, 240 is still sluggish when the terminal is maximized.  Ok, to see the problem simply open a terminal, and cat the attached file.  When you scroll whilst a line that requires wrapping is visible, the thing becomes jerky and unresponsive, on an area where there are no wrapped lines, it all responds fine.  The length of the line doesn't appear to be important, because catting the same file when it wraps at 80 characters causes no noticable side effects to the scrolling.  As I say, 160 seemed alright to me, but people with larger resolutions may just hit the problem a little later on...  5:\

Comment 27 Mike Auty 2007-02-28 22:06:58 UTC

Sorry for the spam, just thought that I should report at 160 on the previously attached file (maximized on a 1600x1200 screen), the [OK] lines (in the middle just after "Emerging (2 of 3) gnome-extra/deskbar-applet-2.17.92") still cause the degradation in scrolling.

Comment 28 Chris Wilson 2007-02-28 22:31:01 UTC

Just to clarify, Mike your recommendation is that we set the cap to 160? Or do you wish to go lower?

Comment 29 Mike Auty 2007-02-28 23:36:33 UTC

It's really hard to tell.  I'd feel much more confident if we had a better understanding of exactly what's causing the problem, but given that we don't I'd say that if you really want to show no signs of lag whatsoever, cap it at a lower rate (probably not higher than 120, but I haven't done any further tests).  If you don't mind a small proportion of people (fglrx users) having a small amount of lag when they manually scroll, 160 would be fine.  Not much of a recommendation I'm afraid...  5:(  Do we have any stats on how much positive difference capping at 160 would make (instead of 80)?  If it doesn't provide a sizable advantage for most users, I'd probably just leave it as its previous value.  Karol, have you had a chance to try out any of the changes suggested here, it'd be nice to have a second opinion?

Comment 30 Chris Wilson 2007-03-01 00:30:44 UTC

As I recall, one of the primary motivators for the bump from ~80 to 540 was to reduce network traffic.

However, against HEAD there is no perceptible difference for longer runs (on my test platform, ssh forwarding over a 100Mb/s link) - i.e writing to the network is not the limiting factor.
Scrolling benchmark for a remote client:
80: 63s
160: 63s
1000: 63s

I am happy to set the cap to 80 as that would then be similar to the typical max run length that vte-0.14 could produce (i.e. a single row) and least likely to cause surprises.

Comment 31 Chris Wilson 2007-03-01 00:32:50 UTC

r1787: 2007-03-01  Chris Wilson  <chris@chris-wilson.co.uk>

	Bug 410534 – Slow content scrolling, takes 100% of CPU.

	* src/vtexft.c:
		Further reduce the cap to 80 after more testing on the
		broken fglrx driver.