After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 781499 - gnome-terminal crashes on SPARC immediately after start
gnome-terminal crashes on SPARC immediately after start
Status: RESOLVED INCOMPLETE
Product: vte
Classification: Core
Component: general
unspecified
Other Solaris
: Normal normal
: ---
Assigned To: VTE Maintainers
VTE Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-04-19 14:12 UTC by jiri.kralovec
Modified: 2018-06-19 18:57 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description jiri.kralovec 2017-04-19 14:12:38 UTC
Gnome-terminal crashes on SPARC immediately after start, the function we fail in is vte_terminal_determine_colors_internal.

Cause of the crash is unaligned access to memory, specifically read from VteCellAttr, which was passed unaligned from the caller.

The problem is that VteCellAttr is based on an 8byte int - guint64 - and on SPARC access to memory needs to be aligned. But the VteCellAttr we read from here is contained in VteCell, which has attribute __packed__ to make it as small as possible and VteCellAttr thus may not be aligned.

Tested on Solaris with vte version 42, but the problem is present in newer versions as well.
Comment 1 jiri.kralovec 2017-04-19 14:26:44 UTC
Possible fixes might be:
1. Not using the __packed__ attribute (on platforms that require aligned memory access), thus consuming more memory.
2. Using the __packed__ attribute on the VteCellAttr as well (on platforms that require aligned memory access), the compiler will then know that it may be unaligned and it will take care of it. But it will slow down access to the field.
3. Change type guint64 in VteCellAttr to guint32 (on platforms that require aligned memory access), access thus will be aligned and two memory accesses will be required to read/write the attribute.

I've tested number one and it fixed the issue.
I would like to fix the issue if you tell me what is the preferred way to do it.
Comment 2 Egmont Koblinger 2017-04-19 15:15:28 UTC
Thanks for the report.

I know this kind of story exists, but I never understood why. Isn't it the compiler's responsibility to take care of such situations, even paying the price of slower code rather than crashing if necessary?

Note that git master has just recently received a possibly relevant commit 279c8d5, and will soon receive a bigger relevant one in bug 779734. I'd appreciate your help in testing that patch ("vte work in progress 9a" at this moment), and fixing this issue on top of that.

With that patch, we'll have two parallel structures: VteCellAttr (intentionally not packed since they are in memory and a few bytes don't matter, speed does) and VteStreamCellAttr (intentionally packed since tons of these are written to a file).

There's still a VteCell containing a vteunistr and a VteCellAttr; this one is packed, I don't know why, I guess we could make it non-packed. This is your number one solution, right? Could you please test it on top of the aforementioned bigger changes?
Comment 3 jiri.kralovec 2017-04-19 15:28:58 UTC
Regarding the compiler's responsibility:
The compiler takes care of the situation, when the attribute __packed__ is added to the VteCellAttr as well.
As it is now, the __packed__ attribute is on VteCell but not on VteCellAttr.

In this case the compiler could deduce that VteCellAttr is affected by the __packed__ attribute on VteCell. If VteCellAttr was defined in a separate library, the compiler might not even know that it is being used in some other library in a __packed__ structure. As such a problem is not always detectable I think they just disregard it. The compiler prints warning that there might be unaligned access to memory caused.
Comment 4 jiri.kralovec 2017-04-19 15:52:56 UTC
Right, I'll test it.
(It is put as first, but the order is purely random.)
Comment 5 Egmont Koblinger 2017-04-19 16:10:17 UTC
In the mean time I'm wondering if we should go for your 3rd proposal instead. I'll do some performance tests on x86. The thing is, with the aforementioned patch VteCellAttr grows to 12 packed / 16 non-packed bytes, and in turn, VteCell would be 16 packed / 24 non-packed, that is, 8 bytes wasted per cell. That's 40kB per terminal at the default size (the height is to be rounded up to a power of two, hence 32x80 cells, and doubled for the alternate screen), and obviously more for bigger sizes, e.g. half a megabyte for _my_ fullscreen terminal. Not something I'd be that super happy to waste on padding for each terminal.
Comment 6 Christian Persch 2017-04-19 19:02:35 UTC
1st or 2nd proposal seem best to me, limiting the change to the archs where it's required.
Comment 7 jiri.kralovec 2017-04-19 19:21:35 UTC
The third case can be limited to just the necessary architectures too.
The fields in VteCellAttr are bitfields, now they are stored in one or two instances of guint64, but they can be divided into twice as much guint32 without impact on the code that uses them. And guint32 may be just used on the affected architectures with guint64 being used on the rest.

The first two are maintainable more easily though, with them you can edit the VteCellAttr without thinking about it consisting of guint32s.

The first two options then additionally require ifdefing or removal of the static asserts on size.
Comment 8 Egmont Koblinger 2017-04-19 19:26:49 UTC
I'd prefer to avoid arch-dependent branches, if possible.
Comment 9 Egmont Koblinger 2017-04-19 20:22:00 UTC
I've added a "work in progress 11" version of the patch in the aforementioned bug, I had it pending on my computer for a few days now. It changes these structures compared to wip9a.

Let's see what we can do with this bug on top of that patch...
Comment 10 Egmont Koblinger 2017-04-19 20:51:52 UTC
Rough avg times of cat'ing my favorite large test file:

wip11 of hyperlink (20 bytes):  5.2s

On top of that, proposed fixes to this bug:
1. (24 bytes):                  5.0s
2-3. (16 bytes) each:           5.8s

I have no firm opinion whether an additional memory consumption of 50-500 kB per widget (16 -> 24 bytes per cell), or a ~15% slowdown when processing tons of data (5.0s -> 5.8s) is worse. Given the current speed of VTE (ever since that 10x speedup one-liner a few years ago) and the current amount of memory in today's computers, I believe either one is still pretty good.

Actually the current wip11 (20 bytes, 5.2s) might be the optimal compromise - if it wasn't the buggy one :)
Comment 11 Egmont Koblinger 2017-04-28 00:02:34 UTC
Any preference which one to go for?

We might quickly implement colored underline as per https://github.com/kovidgoyal/kitty/blob/master/protocol-extensions.asciidoc, and then the fastest solution no longer wastes those 4 bytes :-)
Comment 12 jiri.kralovec 2017-05-04 14:15:55 UTC
It took me some time to get my system to compile the newest version of vte, sorry about that.

It seems that the latest version already contains changes from "vte, work in progress 14" from bug 779734, is that correct?

I was able to run the test application (testvte) after removing _VTE_GNUC_PACKED from VteCell and removing the static assert below it.

The _VTE_GNUC_PACKED attribute is also present VteStreamCellAttr, but I think that the test application doesn't run code that would use it, is that correct?
Is there a way to easily test this too?
Comment 13 Egmont Koblinger 2017-05-04 14:52:22 UTC
(In reply to jiri.kralovec from comment #12)

> It seems that the latest version already contains changes from "vte, work in
> progress 14" from bug 779734, is that correct?

Yes.

> The _VTE_GNUC_PACKED attribute is also present VteStreamCellAttr, but I
> think that the test application doesn't run code that would use it, is that
> correct?

Nope, this is also in use. Set the number of scrollback lines high enough (or maybe it's infinity with the test app, can't recall) and produce colored output so that in the data stream that scrolls out (but is still available by scrolling back with the scrollbar) the color changes at like 10000 or so times.
Comment 14 jiri.kralovec 2017-05-10 12:32:15 UTC
In order to be able to use the test application (not just start it) I had to overcome another obstacle, so it again took some time.

I tested the test application by printing more than 10000 lines with alternating colors (just two colors). And the application works well.
All that I printed was available by scrolling back.

So it is enough to remove _VTE_GNUC_PACKED from VteCell and the static assert below it.
VteStreamCellAttr has the _VTE_GNUC_PACKED directly on itself, so the compiler uses appropriate instructions to access its members (making unaligned access possible).
Comment 15 jiri.kralovec 2017-05-29 14:13:20 UTC
Hi,

can you give me some response to my previous comment?
Should I suggest a specific fix in a patch?
Do you have a preferred way to fix this?
For me it would now be removing _VTE_GNUC_PACKED from VteCell.
Comment 16 John Paul Adrian Glaubitz 2017-07-30 13:09:23 UTC
(In reply to Egmont Koblinger from comment #2)
> I know this kind of story exists, but I never understood why. Isn't it the
> compiler's responsibility to take care of such situations, even paying the
> price of slower code rather than crashing if necessary?

The compiler does take care of these situations and makes sure the alignment is correct. However, the compiler's mechanism fails here if you are using pointers to objects, for example, and cast them to a different type.

These alignment issues are usually a result of bad code. At least when you consider what the ISO standard for C says.

The only way to avoid these crashes would be to add support to the kernel to emulate those unaligned accesses.
Comment 17 Christian Persch 2018-02-27 11:16:51 UTC
On git master, VteCellAttr is packed too; that corresponds to the suggestion (2) in comment 0. Can you check that this is fixed on git master?
Comment 18 John Paul Adrian Glaubitz 2018-06-19 18:57:48 UTC
@Christian: Can I at least test whether it has been resolved before you close this? Sorry for not responding earlier, I missed the earlier mail.