GNOME Bugzilla – Bug 721944
vte is slow
Last modified: 2014-06-16 12:08:29 UTC
$ find / -mount > test $ wc -l test 319658 test $ time cat test master: 50s 0.28.2 (last gtk2 release): 51s 0.16.14: 50s 0.12.2: 9s xterm 293: 10s konsole (kde 4.11): 5s urxvt 9.18: 2s (older vte versions need a bit of massaging to autogen and compile). So even a really old vte (0.12) is rather slow, and somewhere between 0.12 and 0.16 it got much slower.
0.14.0: 10s 0.15.0: 9s 0.16.0: 6s 0.16.7: 5s 0.16.9: 5s 1f782bd: 5.2s 4bab092: 5.3s c9cdd33: 5.3s e6c2187: 5.3s 3f6b100: 5.3s 49a0cdf: 51s 0.16.10: 51s https://git.gnome.org/browse/vte/commit/?h=vte-0-36&id=49a0cdf11d75459c34131d409d4b8cf3f3090f81 is the first bad commit commit 49a0cdf11d75459c34131d409d4b8cf3f3090f81 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Nov 16 14:22:22 2007 +0000 Tweak to read across chunk boundaries whilst still maintaining fairness 2007-11-16 Chris Wilson <chris@chris-wilson.co.uk> * src/vte.c (vte_terminal_io_read), (vte_terminal_init), (process_timeout), (update_repeat_timeout), (update_timeout): Tweak to read across chunk boundaries whilst still maintaining fairness between multiple terminals and refresh rate targets. svn path=/trunk/; revision=1978
Created attachment 266032 [details] [review] undo the slowdown This is just a blind revert of that patch and update for current git. Indeed a ~10x improvement.
Fast version: vte_terminal_io_read() usually returns with TRUE (== try again). Slow version: most of the time "again = bytes < max_bytes" sets it to FALSE. Changing the retval to false in the fast version makes it just as slow as expected, so we're on the right track.
Created attachment 266040 [details] [review] The Real Fix™ Thanks ChPe for git bisecting :) For some reason it didn't occur to me that it could have been a single faulty commit, I assumed it was a degradation over time. input_bytes and max_input_bytes seem to be used for input throttling (handling how many bytes to process before doing UI update or letting others do their stuff or whatever). VTE_MAX_INPUT_READ is about how many bytes we can physically read. Thou shalt not mix 'em. It's not clear to me how active_terminals can have more than 1 entries, since even a multi-window multi-tab gnome-terminal benefits from the fix. Nevermind...
So with either the straight revert, or the Real Fix™ , I only get a 4x speedup back (from ~50s to ~12s) on vte-0-36. Also, on a more involved testcase (img.sh animated.gif > test; time cat test) the revert is 3x faster than the Real Fix™. So I'm leaning towards the revert instead of the 1-line fix... Behdad?
I don't know :(.
I just figured out that "active_terminals" are actually the terminals that are doing some work, not just sitting idle. The Real Fix™ points out that only the "single active terminal" branch was faulty. Correspondingly, current g-t/vte is painfully slow if you "cat test" in a tab and maybe have other tabs/windows sitting idle. But execute "cat test" in two tabs at the same time, and tadaaaaaa, it's quite fast! :) The intent of Chris's patch is getting clearer to me. Previously if let's say normally we read 1000 bytes from a stream at a time (it was subject to the time_process_incoming() estimation) we also did so in case of multiple active tabs. Chris says if incoming estimation says we should read 1000 bytes but there are 3 active tabs then let's read 333 bytes only, but of course tricks the metrics of the input timing estimation accordingly (faking we read 1000 bytes, or something alike). I'm not sure if this is a good idea -- reading too much data would be unfair against non-vte widgets in presence of many active vtes, although reading very small amount of data gives an overhead again. Also I don't get why he actually divices by n-1 instead of n (in the example he actually seems to read 500 bytes in case of 3 active terminals). The second half of his patch seems to be the right thing to do. Start max_input_bytes from a more reasonable value, and do time_process_incoming() on all terminals, not just the last active one. I find his code harder to understand than the original and a good target to be cleaned up. Including: - It has a double loop, the outer one looping only if we'd return by TRUE (meaning "yes call us again"), instead of just simply having the inner one as the original version did and return and let the caller call us again. This would simplify the code and probably not make it noticable slower. - "max_bytes" has a weird temporary overloading in semantics. It's also unclear to me why he divides by n-1 instead of n. - The division by n-1 can, in case of small max_input_bytes and many active terminals, give us 0 and hence cause no bytes being read. Could this be related to bug 616927 and bug 629456??? - With the removal of multiple_active, some runtime if()s can be changed to compile-time #ifs, or maybe the unused code path could be totally eliminated. That being said, my opinion is: I don't think we can make a proper fix today, the story is way more complicated than that, so it'll be delayed for 0.35.2. The proper fix will probably take bits from the old version, bits from the new version (e.g. multiple_active probably shouldn't be resurrected), and other newly written code too. Apparently both ChPe and me are going to work on that, that's good. IMO let's apply the Real Fix™ now, because this is clearly an innocent typo in Chris's work and clearly he intended to have it that way, and makes today's 0.35.1 release substantially faster with the least chance of breaking or regressing anything good that Chris did in his patch that we don't understand yet (e.g. fairness against other non-vte widgets and stuff). We'll get to understand and improve everything for 0.35.2.
I'm really glad that you two are working on this and that someone is finally understanding what Chris meant and improving on it :). Thanks!
I think I was wrong assuming that a value of max_bytes==0 would cause troubles. Again I mixed up the values corresponding to the physical read, and the values used for timing. There's no problem at all if max_bytes is 0, we still do read bytes properly.
Committed the forward fix. The actual real work is just about to begin :)
I'm closing this one. After fixing this bug as well as bug 730732, vte is no longer slow. There's another bug 722065 still open to remind us that it could still be made a bit faster :)