GNOME Bugzilla – Bug 161342
Vte slow with mc and vim
Last modified: 2007-01-18 16:20:39 UTC
Expand gnome-terminal to fullscreen, run mc. Change to a directory with so much files to fill the mc panel. Try to list through the files in the panel using arrow keys. See how the scrolling of the selected line is slow compared to other terminals (konsole, xterm).
What version of vte/gnome-terminal are you using? I don't see any difference when scrolling in xterm. I used /usr/lib containing 2400 files and it was just about the same.
I use gnome-terminal-2.8.0-2/vte-0.11.11-15 from Fedora Core development. The effect was much worse with the gnome-terminal/vte form Fedora Core 3 (almost unpatched vte-0.11.11) as in the current development packages there are some added perf patches which make the scrolling faster. However the main difference against konsole/xterm is that the selected file line moves without skips on them but in the vte/gnome-terminal it starts without skipping and then it starts to skip over files as the redraw is not fast enough. It seems to me that the version without the patches prefered redraws against speed so actualy the movement was slow and it stopped a long time after releasing the cursor key. This effect is pronounced if you use a really high resolution+small font.
gnome-terminal is indeed a bit slower than xterm and (for example) aterm. I agree that it could be better. I can kind of reproduce the 'mc file skipping' behaviour the submitter mentioned using gnome-terminal-2.10.0 / vte-0.11.13 (debian unstable) in fullscreen mode (1280x1024, nvidia gf2 with opensource nv driver, athlon 2200). It's not a very smooth experience, but it's usable nevertheless. The only difference with the submitter is that if I stop pressing the cursor key the scrolling stops immediately (as expected). Maybe the submitter could upgrade (or already has?) to a more recent version of gnome-terminal and vte to see if there are any improvements. Any information regarding hardware setup might be useful too.
I'm using now the vte-0.11.13 with basically the same result as in comment 2. The slow response to keyup event is now not too pronounced but sometimes it happens - HW is Dell Latitude D800 (Pentium M, Nvidia GeForce, 1680x1050). Konsole or xterm work fine.
Some of our scrolling issues were fixed in 0.11.14 by shorcircuiting an optimization for scrolling the whole terminal content. Could you see if this helps?
It is better and smoother than in 0.11.13 but still not as smooth as konsole in KDE 3.4.2. Maybe interesting observation: the selected line scrolls smoother on the bottom of the terminal window (almost fine) than on the top of it. I've also tried simple test with timing cat with a large file on the terminal and with the konsole it takes about 40% shorter time than with gnome-terminal. Xterm is of course even faster but this is expected as it doesn't use the freetype fonts and so on.
Could you please try it now, with vte 0.11.20? I'm quite sure that this could be closed by now.
Actually it the selected bar movement is still not as smooth as in konsole. The bar moves fast but it skips lines. It was smoother (also probably slower but I am not sure with that) in vte-0.11.18.
I'm still working on this. I've found the cause, working on a patch.
*** Bug 96879 has been marked as a duplicate of this bug. ***
What's hapenning is that currently vte keeps a bounding box of all updates, and invalidates (aka updates) that bounding box. This means, when you move your cursor in vim, the rectangle including the cursor and the part of the status bar that shows cursor location is updated. That's about half of the screen on average.
Just wondering if any progress has been made on this. I'm still seeing it using vte-0.12.0-1 on FC5. It doesn't even need to be vim. I have very slow jerky scrolling using less and more as well. By the way I started noticing this problem much more when I started using a dual monitor setup. I'm not sure why. Also, while scrolling up and down through a file I see the Xorg process use up an entire CPU.
Created attachment 79945 [details] [review] Invalidate only the cells. Attached is a patch that only invalidates the cells and not the bbox. Unfortunately, it also contains many [small] speedups for hotspots whilst callgrinding - although on my system the normal process is display limited.
Wow, your kicking some serious a$$ here! I love these stuff, however, there's no way I can review and commit this as one patch. Please break it up into logical units. I suggest you take a look at git-svn and stacked-git. They make the life way easier to manage patchsets. And probably open a separate bug for the other bits. Just a few comments: * In your configure.in changes, the "[default=...]" is wrong, as square brackets are quotation chars in configure.in. You need to do "@<:@default=...@:>@" for example. * If you add new asserts, you should be really really really sure that if it hits we better crash. I personally hate asserts and prefer to do a warning of some kind and proceed. And g_error may be preferrable to assert, but if looking at all the callers, a condition really should be true, assert is fine. * As for the main point of this bug, I have a patch locally that uses a GdkRegion to invalidate the cells only, and not the bbox, but the overhead of updating the region made it slower for most purposes, although the case of vim and mc was improved. Your approach may be different, but you may want to measure speed difference of just that part of the patch. I use "ls -R /usr/lib; time ls -R /usr/lib" as one measure. Do you have interesting before/after numbers?
I'll be back in a bit... Still learning git (having recently discovered git-svn), and about to throw st-git into the mix as well. Off the top of my head, I recall that the repeated invalidates did have a significant impact on the timings, but that most of the invalidates were for 0x0 cells. Lately, all my timings have been within measurement error, except when callgrinding ie not display limited.
Created attachment 80315 [details] [review] Invalidate only the cells (and nothing else!) Clean version. And now for some timing info... time hexdump 16M.wmv vanilla: local 0m27.228s remote 3m9.866s vim-speedup: local 0m27.357s remote 3m2.644s all (more performance tweaking): local 0m24.302s remote 2m31.238s ie we introduce negligble overhead.
Nice. Lets get this in for next week's release. A couple of points: - Now that you are keeping a list yourself, you can switch from GdkRegion to GdkRectangle. A GdkRegion is a list of (nonoverlapping) GdkRectangles. - I just skimmed the patch. You are generating one rect per consecutive cells, right? - Where are the other speedups? I like to get them in before next week and spend the rest of the cycle bugfixing them.
(In reply to comment #17) > - Now that you are keeping a list yourself, you can switch from GdkRegion to > GdkRectangle. A GdkRegion is a list of (nonoverlapping) GdkRectangles. I'd considered whether GdkRegion was needless overhead, but then I'd still need to g_slice_alloc(GdkRectangle) [4 ints for the rectangle vs 6 ints+pointer for the region] and then transform them into GdkRegions during the expose processing. Or I may have missed something. > - I just skimmed the patch. You are generating one rect per consecutive > cells, right? Yes. At first I invalidated each individual cell and noticed a performance hit (obviously!). I then went off to recoup that hit from other hotspots in the code. As I cleaned up the patch, I realised how stupid that was and needed to only reset the bbox if we moved the cursor during an escape sequence... Actually that could be if we moved the cursor outside of our current bbox. > - Where are the other speedups? I like to get them in before next week and > spend the rest of the cycle bugfixing them. Many and varied, but none major. I think from the performance stand-point, the biggest wins are by not using zero-terminated GArrays (which of course we are adding to one unichar at a time!). I am working up a series of patches for review.
(In reply to comment #18) > (In reply to comment #17) > > - Now that you are keeping a list yourself, you can switch from GdkRegion to > > GdkRectangle. A GdkRegion is a list of (nonoverlapping) GdkRectangles. > I'd considered whether GdkRegion was needless overhead, but then I'd still need > to g_slice_alloc(GdkRectangle) [4 ints for the rectangle vs 6 ints+pointer for > the region] and then transform them into GdkRegions during the expose > processing. Or I may have missed something. gdk_window_invalidate_rect() > > - I just skimmed the patch. You are generating one rect per consecutive > > cells, right? > Yes. At first I invalidated each individual cell and noticed a performance hit > (obviously!). I then went off to recoup that hit from other hotspots in the > code. As I cleaned up the patch, I realised how stupid that was and needed to > only reset the bbox if we moved the cursor during an escape sequence... > Actually that could be if we moved the cursor outside of our current bbox. Yeah, I did one per cell and got the overhead too, so it sat there in my checkout to do the grouping. Thanks for doing this. > > - Where are the other speedups? I like to get them in before next week and > > spend the rest of the cycle bugfixing them. > Many and varied, but none major. I think from the performance stand-point, the > biggest wins are by not using zero-terminated GArrays (which of course we are > adding to one unichar at a time!). I am working up a series of patches for > review. Yeah, that avoids memory fragmentation that is a major problem with vte right now. We kinda need our own memory manager for the rows, one that allocates large chunks of memory (multiples of megs) per vte widget, and allocates rows from this only. We can then walk it from time to time and pack the rows, reclaiming any holes in the memory, like garbage collectors do.
(In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > - Now that you are keeping a list yourself, you can switch from GdkRegion to > > > GdkRectangle. A GdkRegion is a list of (nonoverlapping) GdkRectangles. > > I'd considered whether GdkRegion was needless overhead, but then I'd still need > > to g_slice_alloc(GdkRectangle) [4 ints for the rectangle vs 6 ints+pointer for > > the region] and then transform them into GdkRegions during the expose > > processing. Or I may have missed something. > > gdk_window_invalidate_rect() Internally it does gdk_region_rectangle()...gdk_region_destroy(). So we gain by fitting into the smaller GSlice magazine, but lose on the extra g_slice_alloc(). > > > - Where are the other speedups? I like to get them in before next week and > > > spend the rest of the cycle bugfixing them. > > Many and varied, but none major. I think from the performance stand-point, the > > biggest wins are by not using zero-terminated GArrays (which of course we are > > adding to one unichar at a time!). I am working up a series of patches for > > review. > > Yeah, that avoids memory fragmentation that is a major problem with vte right > now. We kinda need our own memory manager for the rows, one that allocates > large chunks of memory (multiples of megs) per vte widget, and allocates rows > from this only. We can then walk it from time to time and pack the rows, > reclaiming any holes in the memory, like garbage collectors do. Now that's more work than I've done here! I only meant using g_array_new(FALSE, FALSE, ...) instead of g_array_new(TRUE, TRUE, ...). I have however seen that bug report [bug #342338] and have myself cursed the lack of reuse of the RowData etc - most noticable is the amount of time we spend in g_realloc due to vte_g_array_fill.
r1429: 2007-01-17 Chris Wilson <chris@chris-wilson.co.uk> Only invalidate groups of inserted chars and not the whole bbox. This speeds up applications like vim which maintain a status line at the bottom of the screen. Fixes bug 161342. * src/vte-private.h: * src/vte.c: (update_regions), (_vte_invalidate_cells), (_vte_invalidate_all), (_vte_terminal_scroll_region), (vte_terminal_process_incoming), (reset_update_regions), (remove_update_timeout), (update_repeat_timeout), (update_timeout):
Thanks.
Humm, still seeing it. configure vte with --enable-debugging and run with VTE_DEBUG_FLAGS=updates ./vte, then vim and move cursor.
When adding the group bbox, I reset it to the current cursor (cut'n'paste) which was not desired. Please check r1451: 2007-01-18 Chris Wilson <chris@chris-wilson.co.uk> Bug 161342 revisited. Reset the box to +-inf and not the current cursor, otherwise you just reproduce the slow behaviour (an overly large invalidation area). * src/vte.c: (vte_terminal_process_incoming): Reset bbox to +-inf. [I'm going to get some sleep before checking through the possible accessibility regression.]
Works like a charm! Thanks. I see some very broken behavior on x86_64. Will go through the compiler warnings to see if they help.
Typical, the machine that's off the network. I didn't think I introduced any 64bitisms, but...
My fault. Working after "make install".