Bug 511096 – g_markup_escape_text crash on some translations

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 511096 - g_markup_escape_text crash on some translations


Summary:	g_markup_escape_text crash on some translations


Status:	RESOLVED WONTFIX

Product:	glib
Classification:	Platform
Component:	general
Version:	2.15.x
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-01-21 18:42 UTC by SchAmane
Modified:	2008-06-12 09:05 UTC

See Also:
GNOME target:	---
GNOME version:	2.21/2.22

Attachments
trivial test program that crashes (198 bytes, text/plain) 2008-05-13 12:16 UTC, Balazs Scheidler	Details

Description SchAmane 2008-01-21 18:42:15 UTC

$ echo $LANG
uk_UA.UTF-8

$ gdb xchat

+ Trace 186235

#0 g_markup_escape_text
from /usr/lib/libglib-2.0.so.0
#1 ??
from /usr/lib/libgtk-x11-2.0.so.0
#2 g_object_set_valist
from /usr/lib/libgobject-2.0.so.0
#3 g_object_set
from /usr/lib/libgobject-2.0.so.0


Have this pretty long time, 2.14.x is affected also.

Comment 1 Owen Taylor 2008-01-21 19:02:05 UTC

This is almost certainly an application bug, but there is no way
of telling which application or what it was doing with the above backtrace.

(Basically, the application passed a bad pointer to GTK+, GTK+
passed it to GLib, Glib tried to read it and crashed.)

See:

http://live.gnome.org/GettingTraces

for information about getting useful backtraces.

Comment 2 SchAmane 2008-01-21 19:54:51 UTC

Sorry, here is full backtrace for glib-2.15.3

Program received signal SIGSEGV, Segmentation fault.

+ Trace 186242

Thread 47955030295424 (LWP 28866)

#0 append_escaped_text
at gmarkup.c line 1946
#1 IA__g_markup_escape_text
at gmarkup.c line 2016
#2 gtk_widget_set_property
at gtkwidget.c line 2444
#3 object_set_property
at gobject.c line 697
#4 IA__g_object_set_valist
at gobject.c line 1130
#5 IA__g_object_set
at gobject.c line 1212
#6 IA__gtk_widget_set_tooltip_text
at gtkwidget.c line 9355
#7 IA__gtk_tooltips_set_tip
at gtktooltips.c line 269
#8 add_tip
at gtkutil.c line 502
#9 fe_set_throttle
at fe-gtk.c line 743
#10 server_flush_queue
at server.c line 838
#11 server_connect
at server.c line 1683
#12 servlist_connect
at servlist.c line 563
#13 servlist_auto_connect
at servlist.c line 627
#14 xchat_auto_connect
at xchat.c line 681
#15 g_idle_dispatch
at gmain.c line 4142
#16 g_main_dispatch
at gmain.c line 2064
#17 IA__g_main_context_dispatch
at gmain.c line 2616
#18 g_main_context_iterate
at gmain.c line 2697
#19 IA__g_main_loop_run
at gmain.c line 2905
#20 IA__gtk_main
at gtkmain.c line 1163
#21 fe_main
at fe-gtk.c line 291
#22 main
at xchat.c line 1065

Comment 3 SchAmane 2008-01-21 20:09:05 UTC

and thats string from xchat.mo :
Черга надсилання у мережу: %d байтів.

Comment 4 Matthias Clasen 2008-01-25 05:44:29 UTC

Looks like for some reason your string gets counted as too long:
frame 1 has length=62, whereas I come up with 61 as the length for the string.

Comment 5 Balazs Scheidler 2008-05-13 12:16:16 UTC

This comment is a me too, although I'm more constructure as I'm attaching a trivial test program that causes g_markup_escape_text to crash.

All it needs is non-utf8 input, as it might skip the terminating NUL character in this case.

The utf8 requirement is not documented.

Comment 6 Balazs Scheidler 2008-05-13 12:16:44 UTC

Created attachment 110850 [details]
trivial test program that crashes

Comment 7 Matthias Clasen 2008-05-15 05:28:48 UTC

It is not particularly surprising that you can cause a crash by violating the explicit documentation:

* @text: some valid UTF-8 text

Adding explicit validation to each entry point that expects valid utf-8 would be prohibitively expensive. That doesn't mean that you have a case if you pass invalid utf-8 and it crashes...

Comment 8 Balazs Scheidler 2008-05-15 09:12:47 UTC

It is not explicitly documented for g_markup_printf_escaped () which evolution was using, albeit it is mentioned that it relies on g_markup_escape_text().

But as it seems it is trivial to crash evolution (and probably other apps), just send an invitation that states it uses UTF8, and use a non-utf8 character near the end that makes the utf8 parser to skip the terminating NUL character.

Easy, let's see some other Dos targets.

If append_escaped_text() used g_utf8_find_next_char() instead of g_utf8_next_char() this case would be closed.

I would not expect character set problems to cause crash even if they are wrongly used. I'm not saying it should be completely validated on entry into these subroutines (that would really be prohibitive), but using g_utf8_find_next_char() which also takes care about possible NUL characters would not hurt performance that badly.

Comment 9 Matthias Clasen 2008-06-10 20:06:48 UTC

> But as it seems it is trivial to crash evolution (and probably other apps),
> just send an invitation that states it uses UTF8, and use a non-utf8 character
> near the end that makes the utf8 parser to skip the terminating NUL character.

If you write an 'outward facing' application like a mail client, you should better read the api docs carefully and validate the string to be utf8 before passing it into a function that expects valid utf8...

Comment 10 Balazs Scheidler 2008-06-11 09:41:10 UTC

(In reply to comment #9)
> > But as it seems it is trivial to crash evolution (and probably other apps),
> > just send an invitation that states it uses UTF8, and use a non-utf8 character
> > near the end that makes the utf8 parser to skip the terminating NUL character.
> 
> If you write an 'outward facing' application like a mail client, you should
> better read the api docs carefully and validate the string to be utf8 before
> passing it into a function that expects valid utf8...
> 

It is your call to resolve this bug to WONTFIX, however let me point out that even you - as the author of the given code - had difficulty of diagnosing a crash triggered by this problem. (see comment #4). Do you think a less skilled developer has a chance of seeing what went wrong? 

And also, the input string to this call is a NUL terminated C string, the input is properly NUL terminated, but still the function crashes as it skips the terminating NUL. I would argue that NUL termination is stronger property of C strings than being UTF-8 encoded.

UTF8 support on UNIX relies on the fact that programs do not "care" about encoding at all, and only in rare cases when the number of characters matters over the number of bytes a string occupies do we need extra code to handle utf8 strings. The kernel does not even care about character sets, as long as '/' and NUL characters are represented the same as in ASCII, the kernel will happily work with any kind of encoding.

I would be suprised if iconv crashed in invalid input and just as surprised if g_strsplit() crashed if a non-utf8 string is passed to it.

And also, I'm not an evolution developer, merely a user of Evolution and a user of Glib in other apps. And given the fact that even high-profile  applications (like evolution) does it wrong, I wonder how many applications do it wrong too.

So, all in all, please reconsider.

Comment 11 SchAmane 2008-06-11 16:25:08 UTC

I cant believe my self. You just close a bug with a reason that you just to lazy to fix.
Let me explain. As a developer you dont care about Strings showed in GUI, mostly there are people who make translations for your application. And you don't know if you application crash because some GNU tool for translation generates bad localizations files. As it seems to be in this bug as i described for xchat.

And even if a crash does happen, normal user don't see any warnings or errors, its just crash.

If you will provide large people group with a API, you have to take care about code quality. And in this case you peace of code sucks. So you can leave this bug closed, who cares, but think about your strategy once more. What services do you provide to us.

Comment 12 SchAmane 2008-06-11 16:30:20 UTC

Normaly, a good developer, who "write an 'outward facing' application like a mail client" will create try and catch scope, and if it cant convert UTF-8 properly it would create empty string, bad converted string, or what ever. But it wouldn't crash whole application with no warnings or reasons.

Comment 13 Balazs Scheidler 2008-06-12 08:12:39 UTC

(In reply to comment #11)
> I cant believe my self. You just close a bug with a reason that you just to
> lazy to fix.
> 

I think your attitude will not help to get this bug fixed. Matthias does a great job, probably in his free time, I think we should thank him for doing this work.

It is his call to decide that a bug is a bug or a "feature", he currently thinks that performance in this case is more important. I have a right to disagree and you have a right to disagree too, but I think the only sane way to do something is to try convincing him with technical arguments.

Calling someone lazy in a free software project is not nice.

However the technical argument holds: a GUI application cannot be expected to validate its own strings, when those strings come from a translation via gettext. And I guess the developer for a given application will be blamed, not the translator, and I doubt developers are fluent enough in the large number of GNOME supported languages to decide whether a given translation file is in the proper encoding or not.

Comment 14 SchAmane 2008-06-12 08:32:24 UTC

That's true, to disagree is my right here.
Let me explain this once again. If you write a kernel, and some userland application crash because kernel doesn't handle something good who you try to blame ? Kernel developer, or application developer ?
Same here. Glib is kind service library to big sort of applications. Crash defenetively happens because glib doesn't know how to handle strings. So who i have to blame ?
I don't expect you build some validation in this function, as you sad because of performance issues. But i would expect that function doesn't crash at this issue, it could create wrong or even empty string as result. Or at least give a warning befor it crash.

As i described in xchat error is coming from gtk, as it just try simple add text set tooltip to a widget. So nothing special is done in application code self.
The localizations file are created with gnu gettext tools. And there happens something what cause error. So who i should start to blame ? Gnu gettext devs, xchat devs, gtk+ devs or glib ? Or translations writer ?

Comment 15 SchAmane 2008-06-12 08:35:50 UTC

And yes, in my opinion "free sofware projects" doesn't mean you don't have to care about a code quality.

Comment 16 Balazs Scheidler 2008-06-12 09:05:00 UTC

Code quality has many aspects, this question is not necessariliy a code quality issue. The kernel-userspace interface is different, the kernel MUST not crash even if the most garbage parameters are passed to it, as that would be a stability problem as applications do call the kernel with garbage as parameters.

Glib on the other hand is not a sealed off part of your application, it's not a black box that should handle all garbage thrown into it. It can be expected that an application crashes if you call g_strsplit(NULL). When you are presented with an API, that API might have requirements from the calling application. This particular API has a requirement that is difficult to meet sometimes and failing to meet this requirement the application as a whole crashes.

The current question is an ease of use vs. performance, I think in this very case the ease of use is more important than performance, I doubt g_markup_escape_text() would be used to produce immense amounts of marked up text, except for displaying on screen in a label.

Matthias, would a patch to fix this issue be accepted?