GNOME Bugzilla – Bug 410534
Slow content scrolling, takes 100% of CPU.
Last modified: 2007-03-01 00:32:50 UTC
I have noticed that in 0.15.3 scrolling gnome-terminal's content using mouse is very slow and what's worse uses 100% of CPU. After downgrading to 0.14.1 everything is fine... Gnome-terminal version is 2.17.91, distribution is Ubuntu Feisty, I am using ATI closed source video drivers, direct rendering is enabled, all other applications are working fine.
Created attachment 83066 [details] sysprof screenshot of 0.14.1 libvte - this version works fine.
Created attachment 83067 [details] sysprof screenshot of 0.15.3 libvte
Karol, you can save the profiles either from the File menu or from the SaveAs toolbar button. Could you describe your testing methodolgy as all it immediately seems to be is the change between 0.14 and 0.15 to reduce the number of lines that the mouse wheel scrolls by.
As of methodology....there is none :) I have simply noticed that is works noticeably slower. For example, if I scroll terminal window's content CPU usage jumps to about 100%. In 0.14.x CPU usage is about 16%. These numbers are from htop which was run in one terminal, the other one contained output from /var/log/messages and window was scrolled using mouse wheel. Another example. Gnome-terminal with running fgl_glxgears, when fgl_glxgears outputs current FPS rate then rendering stops for a fraction of second, nothing similar happens in 0.14.x.
I've reviewed the mouse scrolling code and tweaked HEAD slightly. r1716: 2007-02-21 Chris Wilson <chris@chris-wilson.co.uk> cf Bug 410534 – Slow content scrolling, takes 100% of CPU. * src/vte.c: (vte_terminal_scroll), (vte_terminal_set_scrollback_lines): Operate on scroll delta directly as adjustment->value updates are not instantaneous and we may have several scroll events before the next update. However, judging from your comments, I think there may be a severe renderering performance regression wrt 0.14.x. Could you please profile HEAD and see if the situation has improved? Thanks.
I also just built 0.15.3 (and have a radeon using fglrx with DRI acceleration enabled) and am also experiencing huge delays/CPU usage when scrolling. The scrolling doesn't have to be using the mousewheel, just dragging the scrollbar (or having a program that regularly outputs to stdout) causes the extreme slow downs. It seems mostly to occur after the scrollback buffer has filled to a sufficient size. So if there is only a small amount that can be scrolled it's all pretty fast no matter how far you want to go, however if the buffer is filled with text even moving up a couple of lines seems to exhibit the symptoms. I haven't yet been able to try out HEAD, but I'll see if I can get it built soon and test it out...
Mike, Karol - any insights you can provide today would be most helpful. We're due to make the next release at midnight and I'd like to fix this regression before then :-) Thanks.
I had no time yet to test the HEAD, but at least managed to build it... But I have one more test for you: Put some files with long names (at least 80 characters) in a folder and list them "ls -al", then try scrolling the window content. I have noticed that when it comes to scroll these long file names then CPU usage jumps to 90%, when there isn't much text in a line then it works acceptably (CPU usage is about 30%). I can't test now with 0.14.x version (will do it after work) but I am pretty sure that there was no such slowdown...
That would fit with the few tests I've managed. Trying to compile something (and thus having the extremely long output lines) ran really slowly, whereas compilation on vte 0.14 was fine. I'll have more of a chance to test this evening once I'm back from work...
For testing, I did $ seq 1 1e7 | tr \\n \ > long-lines.txt; time cat long-lines.txt Apart from the unfortunate stutter where we tried to read at ~80MiB/s from the child process but could only process about ~5MiB/s, there was no apparent slow down. Both vte-0.14 and HEAD processed consumed similar amounts of CPU time, except that HEAD completed in less than half the time of vte-0.14. After imposing a heuristic to prevent processing overruns (aiming to keep the refresh and scrolling smooth), there is little to distinguish the behaviour of the two versions on my boxen. Similary with manually scrolling with long lines in the buffer. Can you try the test above to check that it does indeed show a difference between vte-0.14 and vte-0.15? [i.e. that it triggers this regression! ;-]
I can confirm that the test line you provided triggers the regression. The first run (vte-0.14.2) took about 47 seconds. The second run (vte-0.15.3) took 7 minutes 20 seconds (all in real: user 0m0.007s, sys 0m0.707s). I haven't yet had a chance to test HEAD yet.
Sadly, it looks like we missed the last check-in. About an hour ago 0.15.4 was tagged in SVN, and that version took 10 minutes and 2 seconds, so it looks like the regression hasn't been fixed. On another box, this time with an nvidia card (with DRI), 0.14.2 took 48 seconds, whilst 0.15.3 took 44 seconds, so it seems to be rather specific to the fglrx driver, if that helps figure out the problem?
Ouch, an order of magnitude slower! I think I may have upset the fglrx driver... For comparison, can you try the same long-lines test but launch g-t with: VTE_BACKEND=ft2 gnome-terminal --disable-factory which tells vte to use the FreeType2 backend and not libXft. Double-checking against vte-0.14 would be useful, except you need to use: VTE_USE_XFT=0 gnome-terminal --disable-factory instead. Along that path my prime suspect is the introduction of using clipping alongside libXft. You can check this by commenting out the contents of vtexft.c::_vte_xft_clip(). Hmm, don't have access to a radeon+fglrx here (the only radeon I have is on a ppc) so I'm going to need plenty of feedback to find an acceptable workaround.
Yep, running those commands seemed to make the difference. 0.15.4 with VTE_BACKEND=ft2 took 36 seconds. 0.15.4 without VTE_BACKEND=ft2 took 8 minutes 48 on the second run. 0.14.2 with VTE_USE_XFT=0 took 50 seconds. I'm afraid I can't test much more tonight. I'll try commenting out the xft_clip code tomorrow sometime...
uhm, this could become a potential release blocker. :-(
Mike (or anyone in a position to test ;), Behdad wishes to make a new release shortly, primarily to push out the fix for bug 412562. Could you endeavour to see if the slow down is solely due to the _vte_xft_clip()? If so, we can then include the work-around and hopefully have a happy user base in time for gnome-2.18. :-)
Created attachment 83507 [details] [review] Comment out all vte_xft_clip code... Ok, please see the attached patch, which simply comments out the _vte_xft_clip() function. Sadly it appears only to reduce the problem very slightly, but still doesn't fix it by a long shot (and at each version bump, the figures seem to be going up?). 0.15.5 XFT - No patch - 14m5.999s 0.15.5 XFT - Patch - 13m30.932s 0.15.5 FT2 - Patch - 0m36.597s I'll be quite happy to test any patches you may have, or if there's any other tests I can try please let me know. I'd also like to see this hit on it's head (if it's of any use, I tried a very slow machine with an i810 card and DRI, and the 0.15.3 was faster than 0.14.2)...
Thanks Mike for running that test. I really hoped for a nice simple explanation! Another major change was the increasing of max number of glyphs passed to libXft, can you try reducing it? Index: src/vtexft.c =================================================================== --- src/vtexft.c (revision 1775) +++ src/vtexft.c (working copy) @@ -41,6 +41,9 @@ #define DPY_FUDGE 1 +#undef VTE_DRAW_MAX_LENGTH +#define VTE_DRAW_MAX_LENGTH 80 + struct _vte_xft_font { guint ref; Display *display;
I think that might've been it! Applying the above patch to 0.15.5 gave me the following: real 0m34.128s Which seems to have cured it! 5:)
* swears under his breath. Ok, now that I know what the cause was, time for a bit of googling. Normal service will be resumed shortly ;-)
Bah, google revealed nothing! Mike, can you try different max lengths to find the turning point in the curve? Thanks.
Right, here's some preliminary results. It seems there's very little impact up until about 320 after which it starts taking off. Given that it seems to be such a sharp increase, I'm wondering if it's some kind of memory exhaustion and therefore might be different for people with different amounts of video memory or slightly different hardware? These figures could do with testing by other people, but for me, I'd say 320-340 was reasonable. VTE_DRAW_MAX_LENGTH = 80 real 0m33.4s VTE_DRAW_MAX_LENGTH = 160 real 0m33.4s VTE_DRAW_MAX_LENGTH = 320 real 0m33.4s VTE_DRAW_MAX_LENGTH = 340 real 0m33.5s VTE_DRAW_MAX_LENGTH = 350 real 0m33.9s VTE_DRAW_MAX_LENGTH = 360 real 0m35.8s VTE_DRAW_MAX_LENGTH = 370 real 0m40.9s VTE_DRAW_MAX_LENGTH = 380 real 0m57.5s VTE_DRAW_MAX_LENGTH = 400 real 3m18.1s VTE_DRAW_MAX_LENGTH = 480 real Too long VTE_DRAW_MAX_LENGTH = 540 real Whay too long Above 340 it felt as though the numbers were sluggish up to 1e1, and afterwards went up at a reasonable speed. At 340 and less the speed seemed pretty constant throughout, although maybe that's just my eyes. 5;) Hope this helps!
Thanks Mike for the analysis. r1777: 2007-02-28 Chris Wilson <chris@chris-wilson.co.uk> Bug 410534 – Slow content scrolling, takes 100% of CPU. Submitting long glyph runs was causing a dramatic (10x) slow down in the fglrx xserver. * src/vtedraw.h * src/vtexft.c (_vte_xft_draw_text): Cap the max glyph run length to 300.
With not much reason, I'd go for 240... (as in 256 - 16).
I hate to say it, but I also found that at 320 (which is what I left my final patch value at), it could still be a little sluggish dragging the scrollbar on a long page of compilation results (rather a common occurance on Gentoo!), but reducing it down to 160 left the scrolling extremely responsive. I'll give 240 a go and report back the results...
Created attachment 83591 [details] Causes the scrolling slow down Sadly, 240 is still sluggish when the terminal is maximized. Ok, to see the problem simply open a terminal, and cat the attached file. When you scroll whilst a line that requires wrapping is visible, the thing becomes jerky and unresponsive, on an area where there are no wrapped lines, it all responds fine. The length of the line doesn't appear to be important, because catting the same file when it wraps at 80 characters causes no noticable side effects to the scrolling. As I say, 160 seemed alright to me, but people with larger resolutions may just hit the problem a little later on... 5:\
Sorry for the spam, just thought that I should report at 160 on the previously attached file (maximized on a 1600x1200 screen), the [OK] lines (in the middle just after "Emerging (2 of 3) gnome-extra/deskbar-applet-2.17.92") still cause the degradation in scrolling.
Just to clarify, Mike your recommendation is that we set the cap to 160? Or do you wish to go lower?
It's really hard to tell. I'd feel much more confident if we had a better understanding of exactly what's causing the problem, but given that we don't I'd say that if you really want to show no signs of lag whatsoever, cap it at a lower rate (probably not higher than 120, but I haven't done any further tests). If you don't mind a small proportion of people (fglrx users) having a small amount of lag when they manually scroll, 160 would be fine. Not much of a recommendation I'm afraid... 5:( Do we have any stats on how much positive difference capping at 160 would make (instead of 80)? If it doesn't provide a sizable advantage for most users, I'd probably just leave it as its previous value. Karol, have you had a chance to try out any of the changes suggested here, it'd be nice to have a second opinion?
As I recall, one of the primary motivators for the bump from ~80 to 540 was to reduce network traffic. However, against HEAD there is no perceptible difference for longer runs (on my test platform, ssh forwarding over a 100Mb/s link) - i.e writing to the network is not the limiting factor. Scrolling benchmark for a remote client: 80: 63s 160: 63s 1000: 63s I am happy to set the cap to 80 as that would then be similar to the typical max run length that vte-0.14 could produce (i.e. a single row) and least likely to cause surprises.
r1787: 2007-03-01 Chris Wilson <chris@chris-wilson.co.uk> Bug 410534 – Slow content scrolling, takes 100% of CPU. * src/vtexft.c: Further reduce the cap to 80 after more testing on the broken fglrx driver.