GNOME Bugzilla – Bug 511096
g_markup_escape_text crash on some translations
Last modified: 2008-06-12 09:05:00 UTC
$ echo $LANG uk_UA.UTF-8 $ gdb xchat
+ Trace 186235
Have this pretty long time, 2.14.x is affected also.
This is almost certainly an application bug, but there is no way of telling which application or what it was doing with the above backtrace. (Basically, the application passed a bad pointer to GTK+, GTK+ passed it to GLib, Glib tried to read it and crashed.) See: http://live.gnome.org/GettingTraces for information about getting useful backtraces.
Sorry, here is full backtrace for glib-2.15.3 Program received signal SIGSEGV, Segmentation fault.
+ Trace 186242
Thread 47955030295424 (LWP 28866)
and thats string from xchat.mo : Черга надсилання у мережу: %d байтів.
Looks like for some reason your string gets counted as too long: frame 1 has length=62, whereas I come up with 61 as the length for the string.
This comment is a me too, although I'm more constructure as I'm attaching a trivial test program that causes g_markup_escape_text to crash. All it needs is non-utf8 input, as it might skip the terminating NUL character in this case. The utf8 requirement is not documented.
Created attachment 110850 [details] trivial test program that crashes
It is not particularly surprising that you can cause a crash by violating the explicit documentation: * @text: some valid UTF-8 text Adding explicit validation to each entry point that expects valid utf-8 would be prohibitively expensive. That doesn't mean that you have a case if you pass invalid utf-8 and it crashes...
It is not explicitly documented for g_markup_printf_escaped () which evolution was using, albeit it is mentioned that it relies on g_markup_escape_text(). But as it seems it is trivial to crash evolution (and probably other apps), just send an invitation that states it uses UTF8, and use a non-utf8 character near the end that makes the utf8 parser to skip the terminating NUL character. Easy, let's see some other Dos targets. If append_escaped_text() used g_utf8_find_next_char() instead of g_utf8_next_char() this case would be closed. I would not expect character set problems to cause crash even if they are wrongly used. I'm not saying it should be completely validated on entry into these subroutines (that would really be prohibitive), but using g_utf8_find_next_char() which also takes care about possible NUL characters would not hurt performance that badly.
> But as it seems it is trivial to crash evolution (and probably other apps), > just send an invitation that states it uses UTF8, and use a non-utf8 character > near the end that makes the utf8 parser to skip the terminating NUL character. If you write an 'outward facing' application like a mail client, you should better read the api docs carefully and validate the string to be utf8 before passing it into a function that expects valid utf8...
(In reply to comment #9) > > But as it seems it is trivial to crash evolution (and probably other apps), > > just send an invitation that states it uses UTF8, and use a non-utf8 character > > near the end that makes the utf8 parser to skip the terminating NUL character. > > If you write an 'outward facing' application like a mail client, you should > better read the api docs carefully and validate the string to be utf8 before > passing it into a function that expects valid utf8... > It is your call to resolve this bug to WONTFIX, however let me point out that even you - as the author of the given code - had difficulty of diagnosing a crash triggered by this problem. (see comment #4). Do you think a less skilled developer has a chance of seeing what went wrong? And also, the input string to this call is a NUL terminated C string, the input is properly NUL terminated, but still the function crashes as it skips the terminating NUL. I would argue that NUL termination is stronger property of C strings than being UTF-8 encoded. UTF8 support on UNIX relies on the fact that programs do not "care" about encoding at all, and only in rare cases when the number of characters matters over the number of bytes a string occupies do we need extra code to handle utf8 strings. The kernel does not even care about character sets, as long as '/' and NUL characters are represented the same as in ASCII, the kernel will happily work with any kind of encoding. I would be suprised if iconv crashed in invalid input and just as surprised if g_strsplit() crashed if a non-utf8 string is passed to it. And also, I'm not an evolution developer, merely a user of Evolution and a user of Glib in other apps. And given the fact that even high-profile applications (like evolution) does it wrong, I wonder how many applications do it wrong too. So, all in all, please reconsider.
I cant believe my self. You just close a bug with a reason that you just to lazy to fix. Let me explain. As a developer you dont care about Strings showed in GUI, mostly there are people who make translations for your application. And you don't know if you application crash because some GNU tool for translation generates bad localizations files. As it seems to be in this bug as i described for xchat. And even if a crash does happen, normal user don't see any warnings or errors, its just crash. If you will provide large people group with a API, you have to take care about code quality. And in this case you peace of code sucks. So you can leave this bug closed, who cares, but think about your strategy once more. What services do you provide to us.
Normaly, a good developer, who "write an 'outward facing' application like a mail client" will create try and catch scope, and if it cant convert UTF-8 properly it would create empty string, bad converted string, or what ever. But it wouldn't crash whole application with no warnings or reasons.
(In reply to comment #11) > I cant believe my self. You just close a bug with a reason that you just to > lazy to fix. > I think your attitude will not help to get this bug fixed. Matthias does a great job, probably in his free time, I think we should thank him for doing this work. It is his call to decide that a bug is a bug or a "feature", he currently thinks that performance in this case is more important. I have a right to disagree and you have a right to disagree too, but I think the only sane way to do something is to try convincing him with technical arguments. Calling someone lazy in a free software project is not nice. However the technical argument holds: a GUI application cannot be expected to validate its own strings, when those strings come from a translation via gettext. And I guess the developer for a given application will be blamed, not the translator, and I doubt developers are fluent enough in the large number of GNOME supported languages to decide whether a given translation file is in the proper encoding or not.
That's true, to disagree is my right here. Let me explain this once again. If you write a kernel, and some userland application crash because kernel doesn't handle something good who you try to blame ? Kernel developer, or application developer ? Same here. Glib is kind service library to big sort of applications. Crash defenetively happens because glib doesn't know how to handle strings. So who i have to blame ? I don't expect you build some validation in this function, as you sad because of performance issues. But i would expect that function doesn't crash at this issue, it could create wrong or even empty string as result. Or at least give a warning befor it crash. As i described in xchat error is coming from gtk, as it just try simple add text set tooltip to a widget. So nothing special is done in application code self. The localizations file are created with gnu gettext tools. And there happens something what cause error. So who i should start to blame ? Gnu gettext devs, xchat devs, gtk+ devs or glib ? Or translations writer ?
And yes, in my opinion "free sofware projects" doesn't mean you don't have to care about a code quality.
Code quality has many aspects, this question is not necessariliy a code quality issue. The kernel-userspace interface is different, the kernel MUST not crash even if the most garbage parameters are passed to it, as that would be a stability problem as applications do call the kernel with garbage as parameters. Glib on the other hand is not a sealed off part of your application, it's not a black box that should handle all garbage thrown into it. It can be expected that an application crashes if you call g_strsplit(NULL). When you are presented with an API, that API might have requirements from the calling application. This particular API has a requirement that is difficult to meet sometimes and failing to meet this requirement the application as a whole crashes. The current question is an ease of use vs. performance, I think in this very case the ease of use is more important than performance, I doubt g_markup_escape_text() would be used to produce immense amounts of marked up text, except for displaying on screen in a label. Matthias, would a patch to fix this issue be accepted?