After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 771899 - VTE scrollbar update delayed
VTE scrollbar update delayed
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: .General
3.22.x
Other Linux
: Normal major
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2016-09-23 23:17 UTC by Egmont Koblinger
Modified: 2018-02-08 12:55 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
g-t workaround (1.17 KB, patch)
2018-01-08 20:25 UTC, Egmont Koblinger
none Details | Review
vte workaround, v1 (1.24 KB, patch)
2018-01-18 21:20 UTC, Egmont Koblinger
none Details | Review
vte workaround, v2 (1.33 KB, patch)
2018-01-18 21:35 UTC, Egmont Koblinger
none Details | Review
vte workaround, v2b (1.44 KB, patch)
2018-01-19 19:48 UTC, Egmont Koblinger
none Details | Review
vte, hopefully proper fix (2.04 KB, patch)
2018-01-20 16:46 UTC, Egmont Koblinger
committed Details | Review

Description Egmont Koblinger 2016-09-23 23:17:58 UTC
Started happening with the Xenial -> Yakkety beta (Gtk+ 3.18 -> 3.20) update, with both the shipped vte+g-t (0.44.2/3.20.2) and current git (0.47/3.23):

Pretty often the scrollbar is not updated, or is one "step" behind (whereas "step" is something that modifies the scrollbar's position again, or focus loss...).

To reproduce:

- Execute "seq 200" (or anything similar). The scrollbar is at the bottom, as expected.

- Press Shift+PageUp. The scrollbar moves up a bit, as expected.

- Press Shift+PageUp again. The terminal contents are properly updated, yet the scrollbar remains where it was.

- Press Shift+PageDown this time. The scrollbar goes upwards, that is, it just adjusts to the previous Shift+PageUp keypress and now ignores the current Shift+PageDown.

- and so on...

Another similar symptom that happens quite frequently is that you scroll back, press Enter, and the scrollbar doesn't go back to the bottom.

At this moment I suspect a Gtk+ bug but I'm not sure yet.

(Or is maybe bug 70908 relevant?)

Christian or Debarshi (or anyone else), do you see this behavior on Fedora?
Comment 1 Egmont Koblinger 2016-09-23 23:20:17 UTC
Sorry, wrong bug linked... I meant to say:

Or is maybe bug 709089 relevant?
Comment 2 Egmont Koblinger 2016-09-23 23:24:57 UTC
Same bug happens with other libvte2.91-based apps as well (Terminix, xfce4-terminal, mate-terminal)... So the bug must be either in Vte or in Gtk+.
Comment 3 Christian Persch 2016-09-24 07:49:31 UTC
I can repro on F24 with gtk+ 3.20 and vte git master. I suspect it's a gtk bug, related to the warnings printed on console:

Gtk-WARNING **: Allocating size to GtkScrollbar 0x9afa168 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?

These warnings have been popping up all over gnome in many different applications...
Comment 4 Egmont Koblinger 2016-09-24 08:42:35 UTC
Given such an explicit warning by gtk, it could be an intentional change that we have to adapt to...
Comment 5 Christian Persch 2016-09-24 14:44:08 UTC
Using VTE_DEBUG=adj, I see that the adjustment is updated correctly, so this does seem to be a gtk+ problem.
Comment 6 Christian Persch 2016-10-28 09:14:44 UTC
Possibly dup of bug 765410.
Comment 7 Egmont Koblinger 2016-10-28 10:55:34 UTC
Yeah let's mark so, worst case they'll undup.

*** This bug has been marked as a duplicate of bug 765410 ***
Comment 8 Egmont Koblinger 2017-11-09 00:28:44 UTC
Unduping as per bug 765410 comments 29-31.

Reassigning to GTK+, since as per comment 5 over here we suspect this is a GTK+ issue.

At this moment I'm on Ubuntu Artful (GTK+ 3.22.24), and I can still reproduce the issue using the exact same steps as in the original report.

Thanks!
Comment 9 Matthias Clasen 2017-11-09 00:47:44 UTC
not happening here. Every shift-PageUp/Down keypress is visibly moving the scrollbar here. gtk+ 3.22.25.
Comment 10 Egmont Koblinger 2017-11-09 00:55:01 UTC
I can also reproduce with another user, on GNOME Shell and Wayland.

Any idea what the relevant difference could be? (Could you maybe please try with a new user?) 

Was there any possibly relevant change between gtk+ 3.22.24 and .25 so that I should try updating?

Christian, can you still reproduce on Fedora?
Comment 11 Christian Persch 2017-11-09 09:41:44 UTC
Yes, I can still repro. Version is gtk3-3.22.17-2.fc25.x86_64 .
Comment 12 Egmont Koblinger 2018-01-07 23:18:48 UTC
With a manually compiled GTK+ (gtk-3-22), every time the widget initiates a scroll (e.g. newline printed at bottom, shift+pgup, touchpad scroll - but not dragging the scrollbar) the following is printed:

(gnome-terminal-server:2003): Gtk-WARNING **: TerminalWindow 0x55c58be7a3e0 is drawn without a current allocation. This should not happen.
(gnome-terminal-server:2003): Gtk-WARNING **: GtkBox 0x55c58be4f170 is drawn without a current allocation. This should not happen.
(gnome-terminal-server:2003): Gtk-WARNING **: TerminalNotebook 0x55c58be7c250 is drawn without a current allocation. This should not happen.
(gnome-terminal-server:2003): Gtk-WARNING **: TerminalScreenContainer 0x55c58bbe77a0 is drawn without a current allocation. This should not happen.
(gnome-terminal-server:2003): Gtk-WARNING **: GtkBox 0x55c58be4f590 is drawn without a current allocation. This should not happen.
(gnome-terminal-server:2003): Gtk-WARNING **: GtkScrollbar 0x55c58bf6c320 is drawn without a current allocation. This should not happen.

Somewhat similar ones happen with vteapp too.

A 'git blame' of this warning message points to our former dup bug 765410.

Not sure why these aren't printed with my system GTK which is newer than these warnings in the source.

If only I understood what these mean, and whether VTE is really doing something wrong here... Christian, any clue?
Comment 13 Timm Bäder 2018-01-08 07:56:53 UTC
I can only test vteapp here, but:

1) vte explicitly calls queue_resize from inside a size-allocate handler:
  • #5 VteTerminalPrivate::set_size
    at vte.cc line 8018
  • #6 VteTerminalPrivate::widget_size_allocate
    at vte.cc line 8407
  • #7 g_cclosure_marshal_VOID__BOXED
  • #8 g_type_class_meta_marshal
    at /home/baedert/Source/gnome/glib/gobject/gclosure.c line 997
  • #9 g_closure_invoke
    at /home/baedert/Source/gnome/glib/gobject/gclosure.c line 804
  • #10 signal_emit_unlocked_R
    at /home/baedert/Source/gnome/glib/gobject/gsignal.c line 3565
  • #11 g_signal_emit_valist
    at /home/baedert/Source/gnome/glib/gobject/gsignal.c line 3391
  • #12 g_signal_emit
    at /home/baedert/Source/gnome/glib/gobject/gsignal.c line 3447
  • #13 gtk_widget_size_allocate_with_baseline
    at /home/baedert/Source/gnome/gtk+-3/gtk/gtkwidget.c line 6135

which gtk *really* doesn't like. This can lead to the "widget drawn without current allocation" warnings as well as stale widget state (e.g. scrollbars not updating).

> Not sure why these aren't printed with my system GTK which is newer than these warnings in the source.

IIRC we switched them from being inside a #ifdef G_ENABLE_DEBUG to a #ifdef G_ENABLE_CONSISTENCY_CHECKS -- the latter being only defined if you pass --enable-debug to gtk, which doesn't happen on distro builds.
Comment 14 Egmont Koblinger 2018-01-08 08:34:35 UTC
(In reply to Timm Bäder from comment #13)

> 1) vte explicitly calls queue_resize from inside a size-allocate handler:
> which gtk *really* doesn't like.

Thanks, that's a nice starting point. I'll see if we can change VTE not to do this, and whether that fixes the scrollbar issue.

> the latter being only defined if you pass --enable-debug to gtk

Yup I indeed passed that to my GTK. Looks like in order to have a clean VTE source, we should from time to time test it against debug GTK (glib too?).
Comment 15 Egmont Koblinger 2018-01-08 09:05:25 UTC
Just blindly commenting out that one line doesn't fix anything.

Note that it resides in set_size() which is called once at startup, but obviously not while you scroll later on. Also note that it's not that scrollbars are never updating; they are updating but delayed by one step.

My gut feeling tells me the problem is probably something along these lines, but not this one. I'll try to further investigate (although my GTK+ knowledge is pretty low).
Comment 16 Egmont Koblinger 2018-01-08 20:25:03 UTC
Created attachment 366515 [details] [review]
g-t workaround

I played a bit more in VTE and couldn't find what the problem could be.

Even in vteapp or gnome-terminal, querying the scrollbar widget's value reports the desired value. Queueing a redraw there doesn't fix it, queueing with a timeout of 0 does.

See the attached workaround for gnome-terminal. (Of course it's a nasty workaround, and what makes it even worse is that it goes in g-t and not vte, so every vte-based app would have to duplicate. That being said, do we want to submit it to gnome-terminal before we catch the real issue?)
Comment 17 Egmont Koblinger 2018-01-18 21:04:10 UTC
In gnome-terminal, the bug occurs only if the tab bar is not present. If the tab bar is present for whatever reason (multiple tabs, or /org/gnome/terminal/legacy/tab-policy: 'always'), the scrollbar is updated as expected. [It's irrelevant whether there are other g-t windows open.]
Comment 18 Egmont Koblinger 2018-01-18 21:20:51 UTC
Created attachment 367045 [details] [review]
vte workaround, v1

Don't ask :)
Comment 19 Egmont Koblinger 2018-01-18 21:35:58 UTC
Created attachment 367047 [details] [review]
vte workaround, v2

A different workaround (not necessarily better or worse) than the previous one.
Comment 20 Egmont Koblinger 2018-01-18 22:04:25 UTC
A rough picture of what's going on:

VTE's scrollbar handling is complex because it does not immediately update the adjustment's value and such, but adds a layer of asynchronity to only do so on display update.

Presumably this is done for efficiency reasons, e.g. when tons of output is procuded. If it updated the scrollbar's position on every newline received in the input stream, it'd result with a value-changed signal getting emitted on every newline, and so on.

I am not sure if this is really measurable, and I'm sure GTK+ is clever enough not to immediately repaint the scrollbar upon every such change, thousands of times per second. We should try to remove this layer of asynchronity and see if the performance cost is significant. If not, this code simplification would be totally worth it.

---

VTE intrinsically relies on the GtkAdjustment object to do its bookkeeping about the scroll position (it's not exactly clear to me how). At the top of widget_set_vadjustment() it's guaranteed that such object is created internally even if not specified externally. I can't see the design decision here, but I'm not a fan of this approach. I'd expect VTE to do its own bookkeeping, and the adjustment just being an optional "plugin" that shows/alters scroll_delta. We should perhaps see if we can change our code in this direction, and whether it'd simplify/fix anything.

---

Upon pressing Shift+PgUp or friends, basically the following happens [with obvious simplifications]:

First scroll_lines() is called, which calls queue_adjustment_value_changed() and then returns.

Asynchronously a bit later, from an update or process cycle, emit_adjustment_changed() is invoked which calls gtk_adjustment_set_value() with a "little dance" (see the comment in that method) which is a terribly hard to understand hack, and is eliminated in workaround patch v1 above. This gtk_adjustment_set_value() emits value-changed, that is, calls vadjustment_value_changed(). This latter method, if the "little dance" hack was in place, calls invalidate_all() which in turn calls gtk_widget_queue_draw() on the terminal widget.

It seems to me that calling gtk_widget_queue_draw() from gtk_adjustment_set_value()'s value-changed signal is too heavy work for GTK+ and doesn't like it; at least that's what triggers the delayed scrollback. If you comment out that call, or schedule it to be run from the main loop [g_timeout(0, ...)] then the issue no longer happens, this is what my workaround patch v2 demonstrates.

---

Timm, this assumption of mine from the last paragraph is pretty much of the same spirit as your "which gtk *really* doesn't like" from comment 13, am I right? Does GTK+ really dislike doing so "heavy" work from signal handlers? Is there a generic guideline on what's okay and what's not?
Comment 21 Timm Bäder 2018-01-19 10:28:36 UTC
The amount of work you do in such a handler should be irrelevant, GTK+ master in fact even does a queue_allocate in a value-changed handler: https://git.gnome.org/browse/gtk+/tree/gtk/gtkviewport.c#n562

A queue_draw inside a draw() signal is also usually no problem since it does not mutate the widget tree in any way but only marks a region in the widget's window to be redrawn next frame (and we do call queue_draw inside draw() for the GTK_DEBUG=resize functionality).


>I am not sure if this is really measurable, and I'm sure GTK+ is clever enough >not to immediately repaint the scrollbar upon every such change, thousands of >times per second. We should try to remove this layer of asynchronity and see if >the performance cost is significant. If not, this code simplification would be >totally worth it.

A redraw happens at most once per frame of course. Optimizing a queue_draw away might be a reason but I'm not sure if that's really a bottleneck for redrawing the vte widget.

FWIW, gtk has a GTK_DEBUG=geometry env var which will tell you when a queue_resize inside size_allocate happens, but it does so in an entirely useless way like "widget Foobar or a child..." which doesn't tell you anything in the end. I always add a local change like https://paste.fedoraproject.org/paste/f8F~a38Seo9ETkmt5zCnZg so I can use gdb and G_DEBUG=fatal-warnings and get a backtrace at the exact moment the queue_resize is being called.

In case none of this helps or the actual problem ends up being in gtk+ (which it can absolutely be, iirc there's even a problem like this with GtkAdjustment/GtkViepwort itself...), I think using a g_idle_add instead of a g_timeout with a 0 delay makes more sense.
Comment 22 Egmont Koblinger 2018-01-19 19:48:01 UTC
Created attachment 367106 [details] [review]
vte workaround, v2b

Christian, I'd like to submit this workaround until we get to the bottom of the issue. What do you think? I'll be happy to work on the outlined simplifications afterwards to see if it'd have any noticeable performance penalty, and to see if it happens to avoid the current problem. (It would hopefully also cleanly address bug 709089.)

Timm, I appreciate your guidance, but based on that I again suspect a GTK+ bug, and given my zero experience with GTK+ internals I don't think I could get to the bottom of the issue in reasonable time. Out of curiosity I might devote a couple of hours to this, but I don't have high hopes. Ideally a GTK+ developer would help us out here by taking a closer look. (Do you happen to have time/motivation? :-))
Comment 23 Egmont Koblinger 2018-01-19 20:05:28 UTC
(In reply to Timm Bäder from comment #21)

> FWIW, gtk has a GTK_DEBUG=geometry env var which will tell you when a
> queue_resize inside size_allocate happens, but it does so in an entirely
> useless way like "widget Foobar or a child..." which doesn't tell you
> anything in the end.

To clarify: Although at some point some warnings around resize were mentioned, they are not what this bug is about.

In this bug there's no resize occurring at all, and the upper/lower/pagesize/etc. parameters of the adjustment aren't modified either. The only thing that gets modified is the adjustment's value. This is updated correctly in the adjustment object, but not painted on the UI.
Comment 24 Egmont Koblinger 2018-01-19 21:43:25 UTC
The behavior got broken with GTK+ commit 71a5f07620 (on master / 3-22) aka. commit f3dd4a6fee (on 3-20 for 3.20.2):

    widget: queue a redraw only if resize highlighting is enabled
    
    e8aa9b0440e03e7002323922f862342db51c5f32 introduced a new debug mode
    that highlights resizes. Unfortunately it has the side effect of
    always queueing redraws even when the debug mode is not enabled.
    Make the redraw conditional.

All the code it touches is within #ifdef G_ENABLE_DEBUG; seems to me that some level of debugging is enabled by default? I'm yet to see what happens with debugging disabled.

And, contrary to my previous comment, this commit revolves around resize, so maybe there's a resize involved somehow under the hood??
Comment 25 Egmont Koblinger 2018-01-19 22:25:24 UTC
With --disable-debug, the faulty behavior predates the aforementioned change. Which means that this change just accidentally happens to no longer workaround the bug in minimal debug mode, but it isn't the culprit.

3.20.0 is broken, 3.18.9 is correct. Time for another git bisect.
Comment 26 Egmont Koblinger 2018-01-19 23:59:32 UTC
So, with --disable-debug, the culprit seems to be commit 4bb0a8db47

    range: first pass at porting to gadgets
    
    There's still a lot to be done, but this is functional and we'll improve
    the loose ends in the next commits.
Comment 27 Timm Bäder 2018-01-20 06:56:41 UTC
Can you post a simple test case (that's ideally not all of g-t with the rather unpractical client/server split) and updated instructions in how to reproduce it? the original ones didn't quite work for me, just a different problem with the vteapp one.
Comment 28 Egmont Koblinger 2018-01-20 09:27:30 UTC
I'm still pretty much using the original steps, with the vte test app from current git (./src/app/vte-2.91), and LD_LIBRARY_PATH pointing to the test Gtk+ installation.

With some Gtk+ versions the testapp window opens small, in that case I manually resize it to rughly 80x24-ish.

Then I execute "seq 30" so that there's a few lines scrolled out, and press Shift+Ctrl+Up and Shift+Ctrl+Down randomly a couple of times to scroll by one line, keeping my eyes on the scrollbar. Its update is obviously delayed.

Then "seq 100" for a slightly bigger scrollbar, and Shift+PageUp and Shift+PageDown", or even Shift+Home and Shift+End just to notice the same buggy behavior again.

I hope these work for you; if not then... then I don't know how to continue :)
Comment 29 Egmont Koblinger 2018-01-20 09:30:00 UTC
> Then "seq 100" for a slightly bigger scrollbar, and Shift+PageUp and

Typo: "seq 100" for a slightly bigger *scrollback* buffer (shorter scrollbar)
Comment 30 Timm Bäder 2018-01-20 16:17:06 UTC
Don't ask me about the GDK background of any of this, but calling gdk_window_process{_all,}_updates() is IME *never* a good idea. And vte.cc does it quite a lot. The scrollbar gets redrawn out of band so when such a "delayed update" happens, the GtkRange doesn't queue a redraw on itself but just an allocate (or a resize now rather, but I think that's just coming from the original workaround for this bug). It'll reach size_allocate and thus reposition the slider gadget, but it won't redraw itself after that. I didn't dig deeper since I'm fuzzy on the details in gtk3 as well, but removing the gdk_window_process_updates as well as the two gdk_window_process_all_updates calls in vte.cc make your test case work for me. Might break other things of course.
Comment 31 Egmont Koblinger 2018-01-20 16:25:58 UTC
This looks like an amazing catch, thanks a lot! Indeed removing these calls seems to fix the bug for me, without side effects during the first 10 seconds of testing :-) I'll keep using this patch and report my findings.

I'm not really familiar with areas of VTE where such calls happens, and absolutely not familiar with these offending calls themselves at all. Christian, do you have any insight here?
Comment 32 Egmont Koblinger 2018-01-20 16:46:06 UTC
Created attachment 367153 [details] [review]
vte, hopefully proper fix

Remove gdk_window_process{_all,}_updates(), plus the obvious nearby cleanups.

A now-removed comment in the source, plus relevant changelog entries and bugs mention GTK+ DirectFB / Quartz backends from ages ago, is that something that still exists and we care about, and if so, how to test?
Comment 33 Egmont Koblinger 2018-01-20 17:00:28 UTC
DirectFB was removed in 3.0.
Quartz is some Mac thingy, I'd need a Mac to test it, so I won't :P
Comment 34 Christian Persch 2018-01-25 20:55:04 UTC
Comment on attachment 367153 [details] [review]
vte, hopefully proper fix

(In reply to Egmont Koblinger from comment #32)
> Created attachment 367153 [details] [review] [review]
> vte, hopefully proper fix
> 
> Remove gdk_window_process{_all,}_updates(), plus the obvious nearby cleanups.
> 
> A now-removed comment in the source, plus relevant changelog entries and
> bugs mention GTK+ DirectFB / Quartz backends from ages ago, is that
> something that still exists and we care about, and if so, how to test?

Those gdk calls have been in vte since forever, and are likely a gtk-1 leftover. I just have been to scared of them to try and eliminate them :-)

Have you tried a benchmark (like catting a huge "ls -lR /" output file) to see if this causes a perf regression (or improvement!) ? If there's no perf penalty, ok to commit to vte.
Comment 35 Egmont Koblinger 2018-01-25 21:42:10 UTC
> Have you tried a benchmark (like catting a huge "ls -lR /" output file)

Using my standard test file (that I've had for years now), measuring the wall clock time of cat'ing it, it's 3.4–3.7s without scrollback buffer, 5.4–6.0s with 50k lines of scrollback. It's of a huge variance and is a truly imprecise method for measuring, I know :) Anyway, I can't see any difference caused by this change.

And, for the record, the change has been working flawlessly for 5 days.
Comment 36 Egmont Koblinger 2018-01-27 19:27:08 UTC
Comment on attachment 367153 [details] [review]
vte, hopefully proper fix

Timm, many thanks again, there's no way I could have fixed it in a reasonable amount of time :-) Committed on your behalf.
Comment 37 Egmont Koblinger 2018-02-02 14:13:32 UTC
Good news: According to bug 778926 comment 16 this change actually causes a huge performance improvement in certain cases. :)