After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 647466 - Crash in _vte_ring_thaw_one_row:
Crash in _vte_ring_thaw_one_row:
Status: RESOLVED FIXED
Product: vte
Classification: Core
Component: general
0.28.x
Other Linux
: Normal critical
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2011-04-11 15:33 UTC by Eric Piel
Modified: 2014-05-22 21:03 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Quick fix (further cleanup is desired) (740 bytes, patch)
2014-05-22 13:19 UTC, Egmont Koblinger
none Details | Review
Quick fix v2 - for 0.36 (3.00 KB, patch)
2014-05-22 14:45 UTC, Egmont Koblinger
committed Details | Review
Quick fix v2 - for master (3.01 KB, patch)
2014-05-22 14:50 UTC, Egmont Koblinger
committed Details | Review

Description Eric Piel 2011-04-11 15:33:23 UTC
My gnome-terminal just crashed. I _think_ it was triggered by scrolling back in the buffer.

Unfortunately I cannot reproduce it so far. All I have in the log is this line:
Vte-0.0:ERROR:ring.c:347:_vte_ring_thaw_one_row: assertion failed: (ring->start < ring->writable)

My computer is x86-64, with gtk 2.24.4 .
Comment 1 Behdad Esfahbod 2011-04-11 16:18:32 UTC
Humm.  Thanks.  I'll take a look at the code.
Comment 2 Eric Piel 2011-04-12 11:30:16 UTC
BTW, I recently upgraded to glibc 2.13, which contains this bug: http://sources.redhat.com/bugzilla/show_bug.cgi?id=12518

So it might be related to a memcpy with overlapping areas.
Comment 3 Behdad Esfahbod 2011-04-12 22:33:35 UTC
Humm, interesting.  A quick grep for memcpy doesn't reveal anything, but I'll keep looking.
Comment 4 Egmont Koblinger 2014-05-21 22:49:47 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1090192 tells us how to reproduce. (Forget the ssh bits.) unset DISPLAY; cacaxine somerandomvideo => likely crash with current vte master.
Comment 5 Egmont Koblinger 2014-05-21 23:42:10 UTC
My cacaxine is a piece of c@@p. It eventually starts printing escape sequences like "\e[\e[40m.34m" which totally messes up the display, and soon afterwards it usually crashes. Anyway...

As part of this stupidity, it (probably accidentally) emits control sequences that move/resize/iconify the window (why are such sequences allowed at all?), namely \e[40t and such. The crash is related to resizing (comment out the actual resize in vte -> no more crash).
Comment 6 Egmont Koblinger 2014-05-22 00:36:22 UTC
Timing matters: "cat typescript" crashes, "dd if=typescript bs=1" usually finishes successfully. (typescript is cacaxine's output.) When the first resize occurs, scroll_delta is already different in the two cases.

This can be

- a totally independent bug (one more we need to fix, yay!), some sequence interpreted differently if split across multiple reads or something like this, making this bug even harder to catch, or

- the key to the current issue, e.g. resizing handled asyncronously (but at a first glimpse it seems to me it's synchronous).
Comment 7 Egmont Koblinger 2014-05-22 11:13:35 UTC
I think I'm done with the hard part.

The reason for different behavior each time the same "cat typescript" is run is: An escape sequence makes VTE resize itself to let's say 300 rows and also asks the WM to set this size, but the WM kicks back saying "hey, that won't fit, resize back to 50 rows", and this happens asynchronously much later, vte has processed tons of input in between.

If you modify vte_sequence_handler_window_manipulation()'s ">= 24" branch to set a small enough upper cap on the number of rows so that they always fit on your display, VTE's behavior (and the crash) becomes 100% reproducible regardless of timing and such.
Comment 8 Egmont Koblinger 2014-05-22 12:08:39 UTC
Something still depends on timing, though. Minimal test case (starting with default 24 rows):

echo -ne '\e[?1049h\e[25t\e[100S'                        # crashes
echo -ne '\e[?1049h\e[25t'; sleep 1; echo -ne '\e[100S'  # okay
Comment 9 Egmont Koblinger 2014-05-22 13:19:04 UTC
Created attachment 276988 [details] [review]
Quick fix (further cleanup is desired)

The ring's size is not updated immediately upon resize, only when Gtk calls back to us:

Gtk size-allocate signal => vte_terminal_size_allocate() => vte_terminal_set_scrollback_lines() passing it the previous value (really not a nice pattern) => _vte_ring_resize().

It would be nice to do a big cleanup of this code...
Comment 10 Egmont Koblinger 2014-05-22 14:45:30 UTC
Created attachment 276992 [details] [review]
Quick fix v2 - for 0.36

v2 - my proposal to be applied to 0.36.

I wasn't fully satisfied with the first version, as there was still actual terminal work done in the Gtk+ callback.

Note: The "if (screen->scrolling_restricted)" branch was dead, since vte_terminal_set_size() unsets that flag. Removing it makes it simpler to verify that actual work is done in the same order as previously.
Comment 11 Egmont Koblinger 2014-05-22 14:50:41 UTC
Created attachment 276994 [details] [review]
Quick fix v2 - for master

v2 - the same fix for master.

There's a LOT to be refactored, cleaned up...

Clamping the cursor position should happen in vte_terminal_screen_set_size(), but its order with _vte_ring_resize() [called from vte_terminal_set_scrollback_lines()] can't be swapped now.

On the normal screen, the ring can end sooner than the bottom of the screen (some empty lines are not covered by the ring), and all around VTE there are ugly hacks to work this around.

On the alternate screen, the ring should cover the entire screen - but due to this bug, it wasn't always the case. And once there were lines not covered by the ring on the alternate screen, it crashed. (Don't ask why.)

Anyway, I think the cleanup should begin by addressing bug 708213 comment 29.
Comment 12 Behdad Esfahbod 2014-05-22 18:11:38 UTC
lgtm if it looks good to you ;).  Thanks.
Comment 13 Egmont Koblinger 2014-05-22 20:37:03 UTC
Committed: master 4425ee0, vte-0-36 b2f437c.

Probably no one will ever be able to tell if I actually fixed the original crash – that happened once three years ago with no reproducible test case. I'm closing as fixed, please reopen if it still crashes.