After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 794214 - Obscured VteTerminals may become non-responsive until clicked on
Obscured VteTerminals may become non-responsive until clicked on
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.50.x
Other All
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
: 690114 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2018-03-09 18:27 UTC by Debarshi Ray
Modified: 2018-10-25 18:43 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Reproducer scripts (10.00 KB, application/x-compressed-tar)
2018-03-09 18:31 UTC, Debarshi Ray
  Details
Toy VTE front-end (3.37 KB, text/plain)
2018-03-09 18:37 UTC, Debarshi Ray
  Details
widget: Restore it to the list of active terminals when visible (1.00 KB, patch)
2018-03-09 18:53 UTC, Debarshi Ray
committed Details | Review
widget: Remove GdkVisibilityState handling (5.93 KB, patch)
2018-03-14 12:11 UTC, Debarshi Ray
committed Details | Review
widget: Remove GdkVisibilityState handling (5.69 KB, patch)
2018-04-03 10:27 UTC, Debarshi Ray
committed Details | Review
[vte-0-50] widget: Remove GdkVisibilityState handling (5.75 KB, patch)
2018-04-03 10:54 UTC, Debarshi Ray
committed Details | Review

Description Debarshi Ray 2018-03-09 18:27:44 UTC
It's from this RHEL 6 bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1546615

Access is currently restricted, so I will describe it myself.

Spawning a lot of VteTerminals, in the order of a few hundred, under a non-composited X session, with one widget per process (similar to "gnome-terminal --disable-factory") may leave some of the widgets in an inactive state where it is no longer polling for incoming bytes from the pseudo-terminal master. See the attached reproducer.

My understanding is that the code that disables the pseudo-terminal handling when the widget is fully obscured doesn't restore it to the list of active widgets when it is no longer active. If the timings are just about right, then the widget will stop updating until some other event causes the pseudo-terminal master's file descriptor to be added back to the GMainContext. An example of such an event is to click the inactive widget. VTE will interpret the button press as a potential attempt to select text, and once the button is released it will restore the file descriptor to the GMainContext.

I believe that having just one VteTerminal instance per process increases the chances of reproducing the bug because it's easier to reset the list of active terminals to NULL. Similarly, it's necessary to have a non-composited X session because GtkWidget::visibility-notify-event doesn't work under composited windowing systems.
Comment 1 Debarshi Ray 2018-03-09 18:31:27 UTC
Created attachment 369523 [details]
Reproducer scripts

Run loop.sh to reproduce the bug. The logdsp.sh script can be modified to change terminal emulator front-end.
Comment 2 Debarshi Ray 2018-03-09 18:37:02 UTC
Created attachment 369524 [details]
Toy VTE front-end

This is a toy VTE front-end similar to "gnome-terminal --disable-factory" that can be possibly used to reproduce the bug. By its very nature, the bug is timing sensitive so it might not work, but I used it successfully inside a RHEL (or CentOS) 6.8 VM.

It installs a custom GLib log handler that redirects the logs to a file suffixed by the terminal emulator's process ID. Segregating the logs from each process made it easier to track each instance.
Comment 3 Debarshi Ray 2018-03-09 18:53:45 UTC
Created attachment 369525 [details] [review]
widget: Restore it to the list of active terminals when visible

I wonder if it makes more sense to stop using GtkWidget::visibility-notify-event, instead. In 2018, most users have a composited windowing system where it's not possible to detect the visibility of widgets, which is probably why nobody reported this problem before.
Comment 4 Egmont Koblinger 2018-03-09 21:11:32 UTC
Thanks a lot!

I can indeed reproduce the bug under icewm, but not anymore with the attached patch.

I don't know how this "list of active terminals" thingy is supposed to work, I don't even understand the concept, not even why it's necessary, so at this moment (without studying the source) I cannot comment anything more on the patch than a "works for me".

(Not sure what is supposed to be written in the log files, they are all empty for me.)

> I wonder if it makes more sense to stop using
> GtkWidget::visibility-notify-event, instead. In 2018, most users have a
> composited windowing system where it's not possible to detect the visibility
> of widgets

I'd _guess_ fully-obscured is still expected to be reported for the non-focused tabs of the tabbed (notebook) interface of gnome-terminal.

Adding some debugging messages doesn't confirm this though (on compositing unity7): apart from some ephemeral transitional period of fully-obscured and even partial (like, how???), inactive tabs also seem to settle at "unobscured".

I have no idea how and why visibility should influence input processing at all, so I'm fine totally ignoring visibility changes. I hope Christian knows more about the background.

> which is probably why nobody reported this problem before.

I do recall some (maybe 3 or so) bugreports pretty similar to this, about input processing sometimes stopping in obscured windows. No clue where to look for them, they might have been here, or in distro bugtrackers, or stackoverflow sites.
Comment 5 Christian Persch 2018-03-09 21:44:48 UTC
Comment on attachment 369525 [details] [review]
widget: Restore it to the list of active terminals when visible

Wouldn't it be simpler, as well as more correct, to call add_upate_timeout()? Or how else is the update timer going to be restarted?
Comment 6 Christian Persch 2018-03-09 21:51:15 UTC
(In reply to Egmont Koblinger from comment #4) 
> > I wonder if it makes more sense to stop using
> > GtkWidget::visibility-notify-event, instead. In 2018, most users have a
> > composited windowing system where it's not possible to detect the visibility
> > of widgets
> 
> I'd _guess_ fully-obscured is still expected to be reported for the
> non-focused tabs of the tabbed (notebook) interface of gnome-terminal.
> 
> Adding some debugging messages doesn't confirm this though (on compositing
> unity7): apart from some ephemeral transitional period of fully-obscured and
> even partial (like, how???), inactive tabs also seem to settle at
> "unobscured".

It's likely this hasn't worked for years, at least since gtk switched to client-side windows?

We may need to add API so we can en/disable update processing, to be called when we set a tab visible/invisible. Or is gtk filtering out invalidations from nonactive tabs?

> I have no idea how and why visibility should influence input processing at
> all, so I'm fine totally ignoring visibility changes. I hope Christian knows
> more about the background.

No idea, sorry. All this predates my involvement in vte; it looks like this is a early-gtk2 or even gtk1 hold-over.
Comment 7 Egmont Koblinger 2018-03-09 21:59:58 UTC
(In reply to Christian Persch from comment #6)

> We may need to add API so we can en/disable update processing, to be called
> when we set a tab visible/invisible. Or is gtk filtering out invalidations
> from nonactive tabs?

Yup, at least I always thought it was gtk simply not calling widget_draw on nonactive tabs. You made me uncertain now, was it perhaps vte's update processing stuff? This area of the code is totally unclear to me.

Looks like another area where a thorough cleanup (along with addressing bug 744774) is desired. (But let's have a quick fix first :))
Comment 8 Debarshi Ray 2018-03-10 09:14:45 UTC
(In reply to Egmont Koblinger from comment #4)
> I don't know how this "list of active terminals" thingy is supposed to work,
> I don't even understand the concept, not even why it's necessary

My understanding is that the various timeout sources in VTE are shared across all "active" VteTerminal instances in the process. One of the criteria for defining "active" is whether the VteTerminal is at least partly visible or not.

I am not sure why it's necessary to share the timeouts. I can imagine that it is done to limit the number of GSources. There's also a comment that says "maintain fairness between multiple terminals", but I didn't read the code in sufficient detail to understand what it means.

> (Not sure what is supposed to be written in the log files, they are all
> empty for me.)

They are empty because you are normally not logging anything. Try building vte with --enable-debug and run the terminal with the right environment variables, or sprinkle g_message:s.

> > I wonder if it makes more sense to stop using
> > GtkWidget::visibility-notify-event, instead. In 2018, most users have a
> > composited windowing system where it's not possible to detect the visibility
> > of widgets
> 
> I'd _guess_ fully-obscured is still expected to be reported for the
> non-focused tabs of the tabbed (notebook) interface of gnome-terminal.

I don't think so. I think that it's a GDK thing and hence talking about visibility in windowing system terms - obscured top-level GtkWindows and so on.

> I have no idea how and why visibility should influence input processing at
> all, so I'm fine totally ignoring visibility changes. I hope Christian knows
> more about the background.

Maybe it was a performance thing?

However, it looks a bit dangerous to me. eg., in the test case when a VteTerminal gets stuck, I found the shell stuck in a write(2) to the pseudo-terminal slave. I'd have expected the write to succeed by dumping the bytes to a kernel buffer backing the pseudo-terminal. I wonder if that means that fully obscured terminals can actually block the processes running within them.

I'd say that if someone is using a gazillion VteTerminal instances, then they should be prepared to pay for some performance in exchange. :)

> > which is probably why nobody reported this problem before.
> 
> I do recall some (maybe 3 or so) bugreports pretty similar to this, about
> input processing sometimes stopping in obscured windows. No clue where to
> look for them, they might have been here, or in distro bugtrackers, or
> stackoverflow sites.

Oh, I see.
Comment 9 Debarshi Ray 2018-03-10 09:19:12 UTC
(In reply to Christian Persch from comment #5)
> Comment on attachment 369525 [details] [review] [review]
> widget: Restore it to the list of active terminals when visible
> 
> Wouldn't it be simpler, as well as more correct, to call
> add_upate_timeout()? Or how else is the update timer going to be restarted?

The update timer is restarted by the various invalidation methods as long as there is at least one active terminal. GDK_VISIBILITY_UNOBSCURED is handled right there by calling invalidate_all, while the PARTIAL case is handled in a more roundabout manner via through the other invalidate_* methods.
Comment 10 Egmont Koblinger 2018-03-10 10:29:18 UTC
(In reply to Debarshi Ray from comment #8)

> I am not sure why it's necessary to share the timeouts. I can imagine that
> it is done to limit the number of GSources. There's also a comment that says
> "maintain fairness between multiple terminals",

Probably yes to some extent; I'd rather guess fairness between the terminals and the rest of the app.

> However, it looks a bit dangerous to me. eg., in the test case when a
> VteTerminal gets stuck, I found the shell stuck in a write(2) to the
> pseudo-terminal slave. I'd have expected the write to succeed by dumping the
> bytes to a kernel buffer backing the pseudo-terminal.

This is a finite sized buffer. Once filled up, there's nothing else the kernel could do than to block the writer until the reader frees up some space. It's solely a VTE bug if it forgets to read that data, there's no one else to blame.

> I wonder if that means
> that fully obscured terminals can actually block the processes running
> within them.

Looks so :(
Comment 11 Debarshi Ray 2018-03-11 08:48:14 UTC
(In reply to Egmont Koblinger from comment #10)
> > However, it looks a bit dangerous to me. eg., in the test case when a
> > VteTerminal gets stuck, I found the shell stuck in a write(2) to the
> > pseudo-terminal slave. I'd have expected the write to succeed by dumping the
> > bytes to a kernel buffer backing the pseudo-terminal.
> 
> This is a finite sized buffer. Once filled up, there's nothing else the
> kernel could do than to block the writer until the reader frees up some
> space.

Yes, but I have seen stuck VteTerminals that are completely empty - not even the window title had changed. So either the shell wrote a bunch of bytes, filled up the buffer and then got stuck, while VTE removed the fd from the GMainContext before even reading a single byte; or it was something else.
Comment 12 Egmont Koblinger 2018-03-11 12:32:59 UTC
(In reply to Debarshi Ray from comment #11)

> Yes, but I have seen stuck VteTerminals that are completely empty

That's what I saw too...

> not even the window title had changed.

although I didn't pay attention to this detail :)

> So either the shell wrote a bunch of bytes,
> filled up the buffer and then got stuck, while VTE removed the fd from the
> GMainContext before even reading a single byte;

I'm pretty sure it's this; especially since your fix seems to fix this issue.

> or it was something else.

IMO that's damn unlikely.
Comment 13 Debarshi Ray 2018-03-14 12:11:33 UTC
Created attachment 369668 [details] [review]
widget: Remove GdkVisibilityState handling
Comment 14 Egmont Koblinger 2018-03-14 20:30:54 UTC
> > not even the window title had changed.
> 
> although I didn't pay attention to this detail :)

My csh (tcsh) literally prints the echo's "-e" parameter and the rest. I had to change it to a printf to change the title.
Comment 15 Egmont Koblinger 2018-03-14 20:42:30 UTC
(In reply to Debarshi Ray from comment #13)
> Created attachment 369668 [details] [review] [review]
> widget: Remove GdkVisibilityState handling

I tried again your first patch, with the loop of 4096 beefed up to 40960, and continuously manually re-focusing (re-raising) the gnome-terminal where I launched the script from. Occasionally I still got stuck terminals now.

I didn't have full faith in that patch anyway because none of us seems to understand what exactly is going on :)

The second one looks much more promising and clean to me, looks the right approach to be taken, and didn't get stuck at all during some stress-testing.

I haven't properly verified (and not planning to) your second patch, after a quick glimpse it looks okay. I vote for submitting the two patches combined into one (I'd skip the intermediate step as it's still not bugfree), and cherry-pick to 0-52 (maybe older ones too). But first let's wait for Christian, and maybe keep testing this patch for a couple of days...

Thanks a lot again!
Comment 16 Christian Persch 2018-04-02 21:40:45 UTC
Comment on attachment 369668 [details] [review]
widget: Remove GdkVisibilityState handling

Let's go for the complete removal of the visibility stuff.

Master and 0-52, and, if you want, backport to 0-50.

Thanks!
Comment 17 Debarshi Ray 2018-04-03 10:27:24 UTC
Created attachment 370479 [details] [review]
widget: Remove GdkVisibilityState handling

Since attachment 369525 [details] [review] didn't completely fix the optimization, I squashed it with attachment 369668 [details] [review] and pushed to master.
Comment 18 Debarshi Ray 2018-04-03 10:54:27 UTC
Created attachment 370480 [details] [review]
[vte-0-50] widget: Remove GdkVisibilityState handling

It didn't cleanly apply to vte-0-50 because of some conflicts in src/vte.cc. So, just in case, if you want to take a second look.
Comment 19 Egmont Koblinger 2018-10-25 18:43:55 UTC
*** Bug 690114 has been marked as a duplicate of this bug. ***