GNOME Bugzilla – Bug 782715
xfce4-terminal (and others) crashes when dumping a lot of text
Last modified: 2021-06-10 15:22:38 UTC
downstream debian Sid bug is: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862591
I'm seeing xfce4-terminal, lilyterm, and other VTE-using terminal programs crash reliably on an arm64 ChromeOS laptop when I dump a lot of text to stdout all at once. Running "cat" on a 1MB file full of the letter "A" is enough to do it.
This is with a debian package named vte2.91-0.46.1 . That's the latest version in debian/sid; I haven't tried to reproduce this with any other version.
The stack trace looks like:
Thread 1 "xfce4-terminal" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
and the parent's stderr sees the following assertion:
int _vte_boa_uncompress(char*, unsigned int, const char*, unsigned int):
assertion failed (z_ret == Z_OK): (4294967293 == 0)
I can get you a coredump, if that helps.
I don't know what exactly VTE is decompressing here, but I suspect it has to do with the scrollback buffer, since dumping a MB of output will cause a lot of data to get written into the buffer very quickly.
When I get some time, I'll try to track down in the source what sort of error that return code is referring to.
Thanks for this report!
> vte2.91-0.46.1 [...] I haven't tried to reproduce this with any other
No need for that, the relevant code hasn't changed since 0.40.
> assertion failed (z_ret == Z_OK): (4294967293 == 0)
That's zlib's uncompress() returning -3 aka "Z_DATA_ERROR if the input data was corrupted or incomplete"
> I suspect it has to do with the scrollback buffer
That's correct, the scrollback is written to a temporarily (and unlinked) file after compressing (and encrypting) each 64kB block.
Scrollback compression has been working absolutely fine for 2+ years throughout the world in various VTE installations. As such, although this could be a VTE bug, I find it much more likely to be an issue with your zlib, or the encryption layer, or data corruption such as faulty disk or faulty memory.
Do you have encryption enabled? You can tell it e.g. from gnome-terminal's About menu. "+GNUTLS" means enabled, "-GNUTLS" stands for disabled. If you have it enabled then I think we can exclude the disk fault possibility since encryption also comes with a checksum guaranteeing data integrity.
Or is there any chance you running out of memory or disk space under /tmp?
Could you please recompile VTE by changing vtestream-file.h: replace the last argument of the compress2() call from 1 to 0 (that is, no compression) (or maybe to higher compression levels up to 9), does this make a change?
Could you also try with encryption enabled and disabled (./configure --without-gnutls), does it crash in both cases?
> [debian] I'm running this on an ARM64 chromebook (an Acer R13), which
> might be an unusual platform, just in case that makes a difference.
Yeah I think it is going to be relevant.
(In reply to Egmont Koblinger from comment #1)
> replace the last argument of the compress2() call from 1 to 0
Irrelevant note to my past self: I should have made it an #ifdef'd constant :-)
Try strace and attach the output from just before the assertion happens?
Good ideas. Turning off encryption makes the problem go away.
strace shows the last thing it does is to open a tempfile, ftruncate() it to 64KiB, pwrite64() some data into the first 6120 bytes, then pread64()s the whole 64KiB. The next thing it does is write the assertion message to stderr. The data being written has short (5-byte) header and then a whole bunch of zeros, which is the hint that something is going wrong (either compressed or compressed+encrypted data should not be zeros).
I've traced it far enough to see that something is going wrong in the gnutls_cipher_encrypt() call inside _vte_boa_encrypt(). After gnutls_cipher_encrypt(), the data buffer is all zeros. That's clearly not going to decrypt to the correct (compressed) data, which explains why the decompression function is giving us an assert.
I have a short test program that only uses gnutls functions, which emits a random-looking ciphertext on an x86 machine, but emits an all-zero ciphertext on my arm64 chromebook, so it's clearly gnutls that's at fault here. Maybe gnutls is trying to use hardware crypto support on this CPU, and it's not really supported, or something. I'll investigate further.
I don't yet know why the decryption function isn't catching this. My current theory is that the tag is somehow correct for the all-zeros ciphertext, but of course it will decrypt to random garbage.
unsigned int z_ret;
z_ret = compress2 ((Bytef *) dst, &dstlen_ulongf, (const Bytef *) src, srclen, 1);
g_assert_cmpuint (z_ret, ==, Z_OK);
compress2 and uncompress2 return int, not unsigned int, so that explains the 4294967293 instead of -3. Should change that to int and g_assert_cmpint(), I think.)
(In reply to Brian Warner from comment #5)
> I've traced it far enough to see that something is going wrong in the
> gnutls_cipher_encrypt() call inside _vte_boa_encrypt(). After
> gnutls_cipher_encrypt(), the data buffer is all zeros. That's clearly not
> going to decrypt to the correct (compressed) data, which explains why the
> decompression function is giving us an assert.
> I have a short test program that only uses gnutls functions, which emits a
> random-looking ciphertext on an x86 machine, but emits an all-zero
> ciphertext on my arm64 chromebook, so it's clearly gnutls that's at fault
> here. Maybe gnutls is trying to use hardware crypto support on this CPU, and
> it's not really supported, or something. I'll investigate further.
Nice investigation, thanks. I'm also somewhat glad that my guess about zlib or gnutls was correct.
I guess it's time to loop in the gnutls devs.
The gnutls functions we use (gnutls_cipher_encrypt(), gnutls_cipher_tag(), gnutls_cipher_decrypt(), gnutls_cipher_init()) return an int status code, but we never check that it is == GNUTLS_E_SUCCESS (0). We should add g_assert_cmpint()'s for the return code after the calls, just to be on the safe side.
It looks like hardware acceleration on aarch64 is the culprit. I'll attach a test program which reports all-zeros ciphertext when compiled against a gnutls library with acceleration turned on, and reports the right ciphertext when configure gnutls with --disable-hardware-acceleration.
I'll file a bug with the gnutls site.
Created attachment 352335 [details]
gnutls encryption test
compile with: gcc -o tls-bug -g -Wall tls-bug.c -lgnutls
gnutls bug filed: https://gitlab.com/gnutls/gnutls/issues/204
(In reply to Brian Warner from comment #5)
> My current
> theory is that the tag is somehow correct for the all-zeros ciphertext, but
> of course it will decrypt to random garbage.
It could also be that the tag is not correct either, it's just the lack of hw acceleration (or whatever the bug is) that makes it believe that's the correct one.
It's quite irrelevant though :)
(In reply to Christian Persch from comment #8)
> We should add
> g_assert_cmpint()'s for the return code after the calls, just to be on the
> safe side.
-- GitLab Migration Automatic Message --
This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vte/-/issues/2406.