GNOME Bugzilla – Bug 647466
Crash in _vte_ring_thaw_one_row:
Last modified: 2014-05-22 21:03:28 UTC
My gnome-terminal just crashed. I _think_ it was triggered by scrolling back in the buffer. Unfortunately I cannot reproduce it so far. All I have in the log is this line: Vte-0.0:ERROR:ring.c:347:_vte_ring_thaw_one_row: assertion failed: (ring->start < ring->writable) My computer is x86-64, with gtk 2.24.4 .
Humm. Thanks. I'll take a look at the code.
BTW, I recently upgraded to glibc 2.13, which contains this bug: http://sources.redhat.com/bugzilla/show_bug.cgi?id=12518 So it might be related to a memcpy with overlapping areas.
Humm, interesting. A quick grep for memcpy doesn't reveal anything, but I'll keep looking.
https://bugzilla.redhat.com/show_bug.cgi?id=1090192 tells us how to reproduce. (Forget the ssh bits.) unset DISPLAY; cacaxine somerandomvideo => likely crash with current vte master.
My cacaxine is a piece of c@@p. It eventually starts printing escape sequences like "\e[\e[40m.34m" which totally messes up the display, and soon afterwards it usually crashes. Anyway... As part of this stupidity, it (probably accidentally) emits control sequences that move/resize/iconify the window (why are such sequences allowed at all?), namely \e[40t and such. The crash is related to resizing (comment out the actual resize in vte -> no more crash).
Timing matters: "cat typescript" crashes, "dd if=typescript bs=1" usually finishes successfully. (typescript is cacaxine's output.) When the first resize occurs, scroll_delta is already different in the two cases. This can be - a totally independent bug (one more we need to fix, yay!), some sequence interpreted differently if split across multiple reads or something like this, making this bug even harder to catch, or - the key to the current issue, e.g. resizing handled asyncronously (but at a first glimpse it seems to me it's synchronous).
I think I'm done with the hard part. The reason for different behavior each time the same "cat typescript" is run is: An escape sequence makes VTE resize itself to let's say 300 rows and also asks the WM to set this size, but the WM kicks back saying "hey, that won't fit, resize back to 50 rows", and this happens asynchronously much later, vte has processed tons of input in between. If you modify vte_sequence_handler_window_manipulation()'s ">= 24" branch to set a small enough upper cap on the number of rows so that they always fit on your display, VTE's behavior (and the crash) becomes 100% reproducible regardless of timing and such.
Something still depends on timing, though. Minimal test case (starting with default 24 rows): echo -ne '\e[?1049h\e[25t\e[100S' # crashes echo -ne '\e[?1049h\e[25t'; sleep 1; echo -ne '\e[100S' # okay
Created attachment 276988 [details] [review] Quick fix (further cleanup is desired) The ring's size is not updated immediately upon resize, only when Gtk calls back to us: Gtk size-allocate signal => vte_terminal_size_allocate() => vte_terminal_set_scrollback_lines() passing it the previous value (really not a nice pattern) => _vte_ring_resize(). It would be nice to do a big cleanup of this code...
Created attachment 276992 [details] [review] Quick fix v2 - for 0.36 v2 - my proposal to be applied to 0.36. I wasn't fully satisfied with the first version, as there was still actual terminal work done in the Gtk+ callback. Note: The "if (screen->scrolling_restricted)" branch was dead, since vte_terminal_set_size() unsets that flag. Removing it makes it simpler to verify that actual work is done in the same order as previously.
Created attachment 276994 [details] [review] Quick fix v2 - for master v2 - the same fix for master. There's a LOT to be refactored, cleaned up... Clamping the cursor position should happen in vte_terminal_screen_set_size(), but its order with _vte_ring_resize() [called from vte_terminal_set_scrollback_lines()] can't be swapped now. On the normal screen, the ring can end sooner than the bottom of the screen (some empty lines are not covered by the ring), and all around VTE there are ugly hacks to work this around. On the alternate screen, the ring should cover the entire screen - but due to this bug, it wasn't always the case. And once there were lines not covered by the ring on the alternate screen, it crashed. (Don't ask why.) Anyway, I think the cleanup should begin by addressing bug 708213 comment 29.
lgtm if it looks good to you ;). Thanks.
Committed: master 4425ee0, vte-0-36 b2f437c. Probably no one will ever be able to tell if I actually fixed the original crash – that happened once three years ago with no reproducible test case. I'm closing as fixed, please reopen if it still crashes.