GNOME Bugzilla – Bug 345284
gnome terminal is very slow
Last modified: 2010-01-13 19:25:30 UTC
Please describe the problem: I just compared gnome-terminal to multi-gnome-terminal bei issuing $ time seq 1 100000 in a fullscreen terminal window (same font size) on my old ppc machine: result: gt is 4-5 times slower than mgt! As a reference: $ time seq 1 100000 >/dev/null real 0m0.294s user 0m0.284s sys 0m0.004s gt: real 0m11.147s user 0m0.316s sys 0m0.224s mgt: real 0m2.518s user 0m0.456s sys 0m0.540s xterm: real 0m48.954s user 0m0.504s sys 0m0.332s As this is on kernel 2.6.x the reason for xterm/gt being so slow together with a way to work around this (and my original patch for mgt that lead to this speed boost) can be found here: http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-04/0556.html http://lkml.org/lkml/2004/4/3/77 I guess that this can be trivially adapted to gt making it Steps to reproduce: as above Actual results: Expected results: gt to be fast Does this happen every time? yes. Other information: For mgt the fix was to remove the busy loop in the terminal doing while ( (saveerrno == EAGAIN) && (count = read (fd, buffer, 4096)) > 0) { saveerrno = errno; output stuff } If the terminal process gets to much cpu for too long this will make the terminal spit out characters one by one. This in turn will obviously turn of any jumpscrolling. When I add a usleep of 5ms in this loop jump scrolling is working nicely again (but it is still slower than the solution I proposed). This again underlines that _this_ is not a kernel schedulers bug. And yes, since probably all other terminals have their roots in xterm every terminal is affected! I fixed this issue in multi-gnome-terminal by using a buffer of 32kb. It is filled as long as there is input comming in within 10ms. If the buffer is full or 10ms passed, the buffer is written out to the screen. This makes it also 2-3 times faster on kernel 2.4. static void zvt_term_readdata (gpointer data, gint fd, GdkInputCondition condition) { [...] + while ( (count>0) && (select_retval==1) && (total_count<32768) ) + { + count=0; + int maxread=32768-total_count; + if (maxread>4096) + maxread=4096; + + count = read (fd, &buffer[total_count], maxread); + saveerrno=errno; [...] + if (count>0) + total_count+=count; + + FD_ZERO(&rfds); + FD_SET(fd, &rfds); + tv.tv_sec = 0; + tv.tv_usec = 10000; + select_retval = select(fd+1, &rfds, NULL, NULL, &tv); [...]
vte implements a similar input coalescing mechanism.
well then it is not effective enough: $ time find /usr >/dev/null real 0m1.108s user 0m0.277s sys 0m0.833s gnome-terminal (svn current, vte 0.14.1) real 0m18.099s user 0m0.607s sys 0m1.360s multi-gnome-terminal (extremely old 1.6.2...) real 0m11.145s user 0m0.597s sys 0m2.080s
Could you also please try the vte from svn?
gnome-terminal (svn current, vte current) real 0m15.453s user 0m0.703s sys 0m1.223s Ok, slightly faster (it varies about 1s between runs)
So you are comparing: - No display at all (>/dev/null) - vte 0.14.1 with antialiased fonts - zvt with core X fonts - vte current Looks like apples to oranges. Rendering IS NOT FREE. And any terminal widget caching the entire "find /usr" output before showing it is broken IMO. If you don't want the display to hog the process, just don't run it to display. Are you saying that vte should simply wait for a second and show you the final screen? At any rate, rendering more than 10 million chars to the display, with antialiased fonts and 12x22 pixels per glyph can't be free. And vte is doing good enough to not slow down compiles anymore.
Well no. Believe it or not scrolling *is* slow on this 2.16 GHz core duo machine when using plain vi. And I mean by far compared to xterm/mgt. Sorry but for a terminal I would already consider this as a bug. Vte is much faster now but still far from perfect. And here the speedup trick for lots of output has helped of course. Not sure how gt does it but it is still 1.5 times slower. The major bottleneck now seems to be rendering itself (not prebuffering/outputting big chunks) but this should probably be discussed separately.
Yes it's slow too on my ppc 1GHZ. Using aptitude is painful when scrolling down the packages list. It just goes bursty, refresh is not fluid.
Time for a little evidence gathering (and I know speed is not paramount, it's just easier to measure than fluidity :( In svn, I've added a perf directory to start to gather some performance test cases. At the moment all it contains is a single vim scrolling benchmark - which just moves the cursor up and down the screen. Please can you compare the vim.sh output between xterm and vte. And perhaps suggest scripts that simulate your own workflow? And if you could measure burstiness/smoothness... ;)
chris, thanks *a lot* for looking into this. here we go. note that mgt/aterm don't do unicode. also note that xterm does not do jump scrolling (it will loose any time seq 1 100000 bench) for a measure of smoothness just do the following: 1) put keyboard repetition rate to the highest possible level 2) start vim in gt and read in some text file (e.g. :r !dmesg) 3) scroll up an see gt shiver, then do the same in xterm mgt: 0.06user 0.00system 0:00.07elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+804minor)pagefaults 0swaps 132.22user 2.51system 2:16.09elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5544minor)pagefaults 0swaps xterm: 0.06user 0.00system 0:00.07elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+812minor)pagefaults 0swaps 140.96user 2.58system 2:27.21elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5744minor)pagefaults 0swaps gt: 0.05user 0.00system 0:00.06elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+797minor)pagefaults 0swaps 116.15user 2.70system 2:31.25elapsed 78%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+4934minor)pagefaults 0swaps aterm: 0.05user 0.01system 0:00.06elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+800minor)pagefaults 0swaps 127.81user 2.20system 2:11.26elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5336minor)pagefaults 0swaps
"This is a wake-up call for g-t..." Hmm, so we spend more time making syscalls and more time asleep than everybody else. Out of interest, can you compare the vte app inside the source directory? [It'll take a bit of manual tweaking to make it fair - I had to adjust the window size and open a login shell]
./vte app 0.05user 0.00system 0:00.06elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+798minor)pagefaults 0swaps 120.38user 2.57system 2:36.83elapsed 78%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+4934minor)pagefaults 0swaps same thing...
Whoops, misread that completely - the time reports that vim was only given 78% of the cpu due to vte utilizing the other 22% compared to 1-3% for the other terminals. You don't happen to have sysprof handy? As the profile would be quite informative. Thanks.
A couple more points of reference: Remote laptop: xterm 72.60user 7.14system 2:25.71elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2704minor)pagefaults 0swaps g-t 69.22user 5.67system 1:38.49elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2581minor)pagefaults 0swaps Local machine: xterm 21.72user 0.61system 0:23.71elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2736minor)pagefaults 0swaps g-t 21.87user 0.56system 0:25.37elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2615minor)pagefaults 0swaps The conclusion is I need profiling data from a wider range of hardware...
Is it OK for the lines to 'shake' during the test ?
On my 'slow' hardware, i have lowered the number of calls to 10 and run the test in 80x24 and 141x44: - 80x24 : 0.15user 0.05system 0:00.26elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1753minor)pagefaults 0swaps 2.28user 0.08system 0:06.60elapsed 35%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1813minor)pagefaults 0swaps - 141x44 : 0.16user 0.04system 0:00.30elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1769minor)pagefaults 0swaps 4.36user 0.08system 0:16.89elapsed 26%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1847minor)pagefaults 0swaps Is is normal to get such a difference ?
(In reply to comment #14) > Is it OK for the lines to 'shake' during the test ? Yes, the only way I found of forcing vim to actually redraw was by inserting a character into the line (but not deleting it) - the character is then deleted before the next redraw. " insert a character to force vim to update! normal I redraw normal dl (In reply to comment #15) > On my 'slow' hardware, i have lowered the number of calls to 10 and run the > test in 80x24 and 141x44: > - 80x24 : > 2.28user 0.08system 0:06.60elapsed 35%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+1813minor)pagefaults 0swaps > - 141x44 : > 4.36user 0.08system 0:16.89elapsed 26%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+1847minor)pagefaults 0swaps > > Is is normal to get such a difference ? One expects the time taken to scroll to scale linearly with the number of lines it has to scroll over. The length of the line should have little effect as the number of characters in the line is the same in both cases. i.e I'd expect 44/24 * 6.60 = 12.1s as a first approximation. Do you have oprofile available on your ppc? * goes off to see why it's not scaling as well as it should. :-(
ok installed oprofile now... this is the output from a ./vim.sh test in gt. CPU: Core Solo / Duo, speed 1000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 2379302 41.6912 vim.basic CPU_CLK_UNHALT...| samples| %| ------------------ 2378743 99.9765 vim.basic 559 0.0235 anon (tgid:14589 range:0xb7f9b000-0xb7f9c000) 2214068 38.7959 libfb.so 302324 5.2975 fglrx_drv.so 237915 4.1689 libc-2.3.6.so 140788 2.4670 libvte.so.9.1.9 112496 1.9712 Xorg CPU_CLK_UNHALT...| samples| %| ------------------ 112422 99.9342 Xorg 74 0.0658 anon (tgid:2260 range:0xb7f60000-0xb7f61000) 92094 1.6137 libglib-2.0.so.0.1200.9 32765 0.5741 libgdk-x11-2.0.so.0.1000.7 31166 0.5461 libpthread-2.3.6.so 29123 0.5103 libxaa.so 22038 0.3862 libgobject-2.0.so.0.1200.9 19739 0.3459 libncurses.so.5.5 16979 0.2975 oprofiled
We discovered a cause of a massive slow down with the fglrx driver, bug 410534. It would be useful to know if HEAD is becoming usable for you, i.e. have we got to the point where g-t is comparable to other terms? Thanks.
I have a ATI too on my ibook and I have never used the non-free binary fglrx driver.
Still the same thing: $ time find /usr | wc -l 415482 real 0m1.074s user 0m0.396s sys 0m0.732s gt: $ time find /usr real 0m24.574s user 0m0.436s sys 0m1.904s mgt: $ time find /usr real 0m10.546s user 0m0.604s sys 0m2.236s You don't happen to be close to berlin/germany any time soon such that I can demonstrate what I mean live ?
Are you using the same font with the same antialiasing for both, at the same size?
mgt uses zvt which does not do aa. it uses core fonts, too. All this, IIRC, as google does not show me a release more recent than 2002.
Well if you think it's just the rendering that's causing us to be slow... $ VTE_BACKEND=null time gnome-terminal --disable-factory -x echo $ find /usr; VTE_BACKEND=null time gnome-terminal --disable-factory -x find /usr (and comparing against xterm and konsole as they also handle unicode)
Anyone still interested in this bug? I tried compiling mgt but it's so old it's almost impossible without a time machine.
Closing as there's nothing useful to work on here.