After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 782715 - xfce4-terminal (and others) crashes when dumping a lot of text
xfce4-terminal (and others) crashes when dumping a lot of text
Status: RESOLVED OBSOLETE
Product: vte
Classification: Core
Component: general
0.46.x
Other Mac OS
: Normal major
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-05-17 01:01 UTC by Brian Warner
Modified: 2021-06-10 15:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gnutls encryption test (2.56 KB, text/plain)
2017-05-22 05:17 UTC, Brian Warner
Details

Description Brian Warner 2017-05-17 01:01:48 UTC
downstream debian Sid bug is: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862591

I'm seeing xfce4-terminal, lilyterm, and other VTE-using terminal programs crash reliably on an arm64 ChromeOS laptop when I dump a lot of text to stdout all at once. Running "cat" on a 1MB file full of the letter "A" is enough to do it.

This is with a debian package named vte2.91-0.46.1 . That's the latest version in debian/sid; I haven't tried to reproduce this with any other version.

The stack trace looks like:

Thread 1 "xfce4-terminal" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
  • #0 __GI_raise
    at ../sysdeps/unix/sysv/linux/raise.c line 51
  • #1 __GI_abort
    at abort.c line 89
  • #2 g_assertion_message
  • #3 g_assertion_message_cmpnum
  • #4 _vte_boa_uncompress
    at ././src/vtestream-file.h line 790
  • #5 _vte_boa_read_with_overwrite_counter(VteBoa*, gsize, char*, _vte_overwrite_counter_t*)
    at ././src/vtestream-file.h line 911
  • #6 _vte_boa_read
    at ././src/vtestream-file.h line 922
  • #7 _vte_file_stream_read(VteStream*, gsize, char*, gsize)
    at ././src/vtestream-file.h line 1137
  • #8 _vte_ring_read_row_record
    at ././src/ring.cc line 124
  • #9 _vte_ring_discard_one_row
    at ././src/ring.cc line 417
  • #10 _vte_ring_maybe_discard_one_row
    at ././src/ring.cc line 439
  • #11 _vte_ring_insert(VteRing*, gulong)
    at ././src/ring.cc line 551
  • #12 VteTerminalPrivate::ring_insert(long, bool)
    at ././src/vte.cc line 247
  • #13 VteTerminalPrivate::ring_append(bool)
    at ././src/vte.cc line 257
  • #14 VteTerminalPrivate::insert_rows(unsigned int)
    at ././src/vte.cc line 2188
  • #15 VteTerminalPrivate::update_insert_delta()
    at ././src/vte.cc line 2234
  • #16 VteTerminalPrivate::insert_char(unsigned int, bool, bool)
    at ././src/vte.cc line 2964
  • #17 VteTerminalPrivate::process_incoming()
    at ././src/vte.cc line 3686
  • #18 VteTerminalPrivate::time_process_incoming()
    at ././src/vte.cc line 10428
  • #19 VteTerminalPrivate::process(bool)
    at ././src/vte.cc line 10452
  • #20 update_timeout(gpointer)
    at ././src/vte.cc line 10679
  • #21 g_timeout_dispatch
    at ././glib/gmain.c line 4674
  • #22 g_main_dispatch
    at ././glib/gmain.c line 3203
  • #23 g_main_context_dispatch
    at ././glib/gmain.c line 3856
  • #24 g_main_context_iterate
    at ././glib/gmain.c line 3929
  • #25 g_main_loop_run
    at ././glib/gmain.c line 4125
  • #26 gtk_main
  • #27 main
    at main.c line 330

and the parent's stderr sees the following assertion:

Vte:ERROR:/home/warner/stuff/debian/vte2.91-0.46.1/./src/vtestream-file.h:790:unsigned
int _vte_boa_uncompress(char*, unsigned int, const char*, unsigned int):
assertion failed (z_ret == Z_OK): (4294967293 == 0)


I can get you a coredump, if that helps.

I don't know what exactly VTE is decompressing here, but I suspect it has to do with the scrollback buffer, since dumping a MB of output will cause a lot of data to get written into the buffer very quickly.

When I get some time, I'll try to track down in the source what sort of error that return code is referring to.

thanks!
 -Brian
Comment 1 Egmont Koblinger 2017-05-17 06:26:52 UTC
Thanks for this report!

> vte2.91-0.46.1 [...] I haven't tried to reproduce this with any other
> version.

No need for that, the relevant code hasn't changed since 0.40.

> assertion failed (z_ret == Z_OK): (4294967293 == 0)

That's zlib's uncompress() returning -3 aka "Z_DATA_ERROR if the input data was corrupted or incomplete"

> I suspect it has to do with the scrollback buffer

That's correct, the scrollback is written to a temporarily (and unlinked) file after compressing (and encrypting) each 64kB block.

Scrollback compression has been working absolutely fine for 2+ years throughout the world in various VTE installations. As such, although this could be a VTE bug, I find it much more likely to be an issue with your zlib, or the encryption layer, or data corruption such as faulty disk or faulty memory.

Do you have encryption enabled? You can tell it e.g. from gnome-terminal's About menu. "+GNUTLS" means enabled, "-GNUTLS" stands for disabled. If you have it enabled then I think we can exclude the disk fault possibility since encryption also comes with a checksum guaranteeing data integrity.

Or is there any chance you running out of memory or disk space under /tmp?

Could you please recompile VTE by changing vtestream-file.h: replace the last argument of the compress2() call from 1 to 0 (that is, no compression) (or maybe to higher compression levels up to 9), does this make a change?

Could you also try with encryption enabled and disabled (./configure --without-gnutls), does it crash in both cases?
Comment 2 Egmont Koblinger 2017-05-17 06:32:51 UTC
> [debian] I'm running this on an ARM64 chromebook (an Acer R13), which
> might be an unusual platform, just in case that makes a difference.

Yeah I think it is going to be relevant.
Comment 3 Egmont Koblinger 2017-05-17 06:38:50 UTC
(In reply to Egmont Koblinger from comment #1)

> replace the last argument of the compress2() call from 1 to 0

Irrelevant note to my past self: I should have made it an #ifdef'd constant :-)
Comment 4 Christian Persch 2017-05-17 07:37:04 UTC
Try strace and attach the output from just before the assertion happens?
Comment 5 Brian Warner 2017-05-18 07:43:51 UTC
Good ideas. Turning off encryption makes the problem go away.

strace shows the last thing it does is to open a tempfile, ftruncate() it to 64KiB, pwrite64() some data into the first 6120 bytes, then pread64()s the whole 64KiB. The next thing it does is write the assertion message to stderr. The data being written has short (5-byte) header and then a whole bunch of zeros, which is the hint that something is going wrong (either compressed or compressed+encrypted data should not be zeros).

I've traced it far enough to see that something is going wrong in the gnutls_cipher_encrypt() call inside _vte_boa_encrypt(). After gnutls_cipher_encrypt(), the data buffer is all zeros. That's clearly not going to decrypt to the correct (compressed) data, which explains why the decompression function is giving us an assert.

I have a short test program that only uses gnutls functions, which emits a random-looking ciphertext on an x86 machine, but emits an all-zero ciphertext on my arm64 chromebook, so it's clearly gnutls that's at fault here. Maybe gnutls is trying to use hardware crypto support on this CPU, and it's not really supported, or something. I'll investigate further.

I don't yet know why the decryption function isn't catching this. My current theory is that the tag is somehow correct for the all-zeros ciphertext, but of course it will decrypt to random garbage.
Comment 6 Christian Persch 2017-05-18 08:18:12 UTC
(BTW:

        unsigned int z_ret;

        z_ret = compress2 ((Bytef *) dst, &dstlen_ulongf, (const Bytef *) src, srclen, 1);
        g_assert_cmpuint (z_ret, ==, Z_OK);

compress2 and uncompress2 return int, not unsigned int, so that explains the 4294967293 instead of -3. Should change that to int and g_assert_cmpint(), I think.)

(In reply to Brian Warner from comment #5)
> I've traced it far enough to see that something is going wrong in the
> gnutls_cipher_encrypt() call inside _vte_boa_encrypt(). After
> gnutls_cipher_encrypt(), the data buffer is all zeros. That's clearly not
> going to decrypt to the correct (compressed) data, which explains why the
> decompression function is giving us an assert.
> 
> I have a short test program that only uses gnutls functions, which emits a
> random-looking ciphertext on an x86 machine, but emits an all-zero
> ciphertext on my arm64 chromebook, so it's clearly gnutls that's at fault
> here. Maybe gnutls is trying to use hardware crypto support on this CPU, and
> it's not really supported, or something. I'll investigate further.

Ok.
Comment 7 Egmont Koblinger 2017-05-18 11:06:15 UTC
Nice investigation, thanks. I'm also somewhat glad that my guess about zlib or gnutls was correct.

I guess it's time to loop in the gnutls devs.
Comment 8 Christian Persch 2017-05-20 11:35:36 UTC
The gnutls functions we use (gnutls_cipher_encrypt(), gnutls_cipher_tag(), gnutls_cipher_decrypt(), gnutls_cipher_init()) return an int status code, but we never check that it is == GNUTLS_E_SUCCESS (0). We should add g_assert_cmpint()'s for the return code after the calls, just to be on the safe side.
Comment 9 Brian Warner 2017-05-22 05:15:54 UTC
It looks like hardware acceleration on aarch64 is the culprit. I'll attach a test program which reports all-zeros ciphertext when compiled against a gnutls library with acceleration turned on, and reports the right ciphertext when configure gnutls with --disable-hardware-acceleration.

I'll file a bug with the gnutls site.
Comment 10 Brian Warner 2017-05-22 05:17:58 UTC
Created attachment 352335 [details]
gnutls encryption test

compile with: gcc -o tls-bug -g -Wall tls-bug.c -lgnutls
Comment 11 Brian Warner 2017-05-22 05:39:39 UTC
gnutls bug filed: https://gitlab.com/gnutls/gnutls/issues/204
Comment 12 Egmont Koblinger 2017-05-22 09:58:08 UTC
(In reply to Brian Warner from comment #5)

> My current
> theory is that the tag is somehow correct for the all-zeros ciphertext, but
> of course it will decrypt to random garbage.

It could also be that the tag is not correct either, it's just the lack of hw acceleration (or whatever the bug is) that makes it believe that's the correct one.

It's quite irrelevant though :)

(In reply to Christian Persch from comment #8)

> We should add
> g_assert_cmpint()'s for the return code after the calls, just to be on the
> safe side.

Will do.
Comment 13 GNOME Infrastructure Team 2021-06-10 15:22:38 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/2406.