GNOME Bugzilla – Bug 754887
Loading a binary file cause a crash
Last modified: 2015-09-12 09:42:17 UTC
Created attachment 311156 [details] example file Some programs crash when trying to display non-UTF-8 files (mainly, binary files): > Gtk:ERROR:gtktextsegment.c:195:_gtk_char_segment_new: assertion failed: (gtk_text_byte_begins_utf8_char (text)) > Aborted (core dumped) This affects gEdit and more importantly gnome-terminal (where it's not uncommon to accidentally `cat` a binary file). gtk3 3.17.8.r116.g4993b02 (git master)
The crash is inside GtkCharSegment, which is used by the GtkTextBTree, i.e. GtkTextBuffer — something that is not used by gnome-terminal, so it has no bearing on any gnome-terminal crash. GtkTextView is a GTK+ widget, and as such it has to receive UTF-8 text only. It's up to the caller to ensure that the text is UTF-8, as adding UTF-8 validation to all text-related API would be excessively expensive. If gedit is crashing because of an invalid UTF-8 file, then the issue lies in gedit and/or GtkSourceView. Re-assigning to the right component.
g_utf8_validate() is called in gtk_text_buffer_emit_insert() and returns TRUE. But later, gtk_text_byte_begins_utf8_char() returns FALSE. Anyway, opening binary files with GtkSourceFileLoader (and thus gedit) is known to be buggy. Re-assigning to GtkSourceView. The backtrace:
+ Trace 235435
Seems like GtkSourceView is fine – bisected down to 3188b8e in glib: commit 3188b8ee791a38ac3dd7e477f30761344442f745 Author: Mikhail Zabaluev <mikhail.zabaluev@gmail.com> Date: Tue Oct 14 01:18:57 2014 +0300 Optimized branching in g_utf8_validate()
Ok thanks for the git bisect. See bug #738504.
Can you isolate the sequence passed to g_utf8_validate() that's causing this?
(In reply to Mikhail Zabaluev from comment #5) > Can you isolate the sequence passed to g_utf8_validate() that's causing this? Causing the crash I don't know, but mistakenly being accepted as valid UTF-8 – a few examples: * d2 a1 2f 03 b2 88 * 55 9d b7 85 86 58 * 5b 01 28 88 91 24 * a6 a3 30 64 06 03
From the backtrace, the sequence is: "\232\251I"
*** This bug has been marked as a duplicate of bug 738504 ***