GNOME Bugzilla – Bug 141124
Crash when displaying "Open File" dialog box (WIN32)
Last modified: 2004-12-22 21:47:04 UTC
1. Active "Open File" dialog box; 2. Navigate to a directory which containing files with Chinese names( containing two or more Chinese charactors); 3. The crash occur, showing a dialog box with the following message: GLib-ERROR **: gmem.c:140: failed to allocate 2147483649 bytes aborting...
Created attachment 27100 [details] crash
*** Bug 141123 has been marked as a duplicate of this bug. ***
It would be essential to know what version of GTK+ this bug is about.
Sorry, I forgot it. I tried all 2.2.x and 2.4.x binaries I can find on the net and they all have this bug. It seems that all version of GTK+2 have this bug.
I guess this is on a Chinese version of Windows, where the system codepage is multibyte? Sorry, I don't have access to any such machine, so I can't really duplicate. Just having Chinese characters in file names on an English Windows 2000 doesn't cause any crash. (But the files in question don't work in the file chooser nor the file selector either. This is to be expected, as GLib and GTK don't use wide-char APIs, so they only handle file names that can be expressed in the system codepage.) Somebody needs to debug this problem on a Chinese (or Korean or Japanese) Windows box.
I don't have any experience of debuging GTK lib on windows. And I am also not familiar to GTK source. So could you give me some hints to get start with it?
*** Bug 143196 has been marked as a duplicate of this bug. ***
*** Bug 144736 has been marked as a duplicate of this bug. ***
See also 141841, which may be a duplicate (and for which the reporter says that it only crashes on some of the directories containing Korean characters).
*** Bug 141841 has been marked as a duplicate of this bug. ***
*** Bug 148054 has been marked as a duplicate of this bug. ***
Created attachment 30124 [details] gdir.exe Please run the attached executable (from the command line) in the folder where the problematic file names are. The program should just list the file names. I don't expect it to fail, but at least that will tell me that the problem is not in gdir.c.
I guess one way for me to debug this on my English Windows machine would be to hack together a special version of GLib that would pretend having a system codepage of 949 (Korean), for instance, and see what happens. Hmm.
gdir.exe runs okay on my Korean Windows XP system, with the result including the filename that causes the problem in GIMP.
gdir.exe runs okay on my Chinese Windows XP system, with the result including the filename that causes the problem in GIMP.
gdir.exe runs OK on my Simplified Chinese Windows XP system, also.
OK, good, so the problem is not in the g_dir_* functions then.
Using the latest pygtk release i do the following to crash with the same error. import gtk gtk.FileChooserDialog().run() And when navigating to one of the folder which makes the dialog crash using the simple windows debugger i get the following callstack 77e398ec ffffffff 00000003 SharedUserData!SystemCallStub+0x4 WARNING: Stack unwind information not available. Following frames may be wrong. 00000003 77e8f3b0 ffffffff ntdll!ZwTerminateProcess+0xc 00000003 77be7ad9 00000003 kernel32!ExitProcess+0x12 00000016 00000018 00a0eae7 msvcrt!strerror+0x2719 00a0c0e1 00000004 00a0c0c0 msvcrt!abort+0xe 00a0c0e1 00000004 00a0c0c0 libglib_2_0_0!g_log+0x1a 80000001 00df3498 00000000 libglib_2_0_0!g_malloc+0x3e 00db62e8 ffffffff 00db58f0 libglib_2_0_0!g_utf8_collate_key+0xaf 00db5ab0 00db5fb8 00000007 libgtk_win32_2_0_0!gtk_file_info_get_display_key+0x31 00db58f0 0012eea0 0012ee90 libgtk_win32_2_0_0!gtk_file_chooser_dialog_new_with_backend+0x6ce9 0012eee0 014557e0 0012f090 libgtk_win32_2_0_0!gtk_tree_model_sort_new_with_model+0x298e 01455718 0000001c 00000008 libglib_2_0_0!g_qsort_with_data+0x274 009c5d1c 012650c0 0012f090 libglib_2_0_0!g_array_sort_with_data+0x27 01455198 00db4f48 00000000 libgtk_win32_2_0_0!gtk_tree_model_sort_new_with_model+0x2c1e 01455198 00000000 00000000 libgtk_win32_2_0_0!gtk_tree_model_sort_convert_iter_to_child_iter+0x376 01455198 0012f1e0 00db68c8 libgtk_win32_2_0_0!gtk_tree_model_sort_new_with_model+0x142b 01455198 0012f1e0 00db68c8 libgtk_win32_2_0_0!gtk_tree_model_get_iter+0xa3 00db0208 01455198 00000000 libgtk_win32_2_0_0!gtk_tree_view_set_model+0x13a 00c41608 0012f2a8 00000000 libgtk_win32_2_0_0!gtk_file_chooser_dialog_new_with_backend+0x71ba 00c41608 00db7080 0012f2a8 libgtk_win32_2_0_0!gtk_file_chooser_dialog_new_with_backend+0x7450 00c41608 00db7080 0012f2a8 libgtk_win32_2_0_0!gtk_file_chooser_get_current_folder_uri+0x14f 00c41608 0144aec8 00000001 libgtk_win32_2_0_0!gtk_file_chooser_dialog_new_with_backend+0x25cc 00db0208 00deb660 00db1d28 libgtk_win32_2_0_0!gtk_file_chooser_dialog_new_with_backend+0x8f88 00db1bc8 00000000 00000003 libgtk_win32_2_0_0!gtk_marshal_VOID__UINT_STRING+0x199e 00db1bc8 00000000 00000003 libgobject_2_0_0!g_closure_invoke+0x98 00da5710 00000000 00db0208 libgobject_2_0_0!g_signal_emit_by_name+0xa7d 00db0208 00000091 00000000 libgobject_2_0_0!g_signal_emit_valist+0x6c7 00db0208 00000091 00000000 libgobject_2_0_0!g_signal_emit+0x1a 00db0208 0144e628 00db1d28 libgtk_win32_2_0_0!gtk_tree_view_row_activated+0x45 00db0208 01440510 009c55b8 libgtk_win32_2_0_0!gtk_tree_view_get_type+0x37c7 Here is the memory dump at the address passed to g_utf8_collatekey 00db58f0 38 4b db 00 02 00 00 00 98 d7 92 00 a0 cd d7 00 8K.............. 00db5900 3f 00 00 00 f8 69 db 00 f8 4c db 00 f8 15 45 01 ?....i...L....E. This does not look like any of my file in my directory. I will try to isolate a single file.
Created attachment 30437 [details] Crashes the file chooser when browsing this hierarchy Add a small zip file of a directory structure which always end up crashing the gtk program. XP being a single worldwide binary, I hope that it will induce a similar crash on an English XP.
XP might be a single worldwide binary, but still each machine has just one system codepage, which is either single-byte or multibyte. I assume this problem occurs only on machines with a multibyte system codepage. I doubt the filenames in that zip file got unpacked correctly on my (codepage 1252) machine... What are the file names supposed to be, some CJK characters? Is there any Windows file archive format that would keep the file names in UTF- 16 or UTF-8 and corresponding software then use the wide char API?
yes the directory structure is gtkcrash | +-翻訳 +-プログラム This is japanese text I am very suspicious of what you are saying about the codepage thing: NTFS can perfectly display/represent/store Unicode filename (I can store a filename mixing French (with accent) and Japanese perfectly). In any case, if you try to downgrade this filename back to sjis then you will have serious troubles... Doesn't GTK use the W versions of all Windows APIs?
> This is japanese text (But on my machine just looks like an odd combination of Latin-1 characters...) > NTFS can perfectly display/represent/store Unicode filename Yes, of course, as it uses UTF-16. > Doesn't GTK use the W versions of all Windows APIs? No. If it did, it wouldn't run on Win9x. And as for instance the GDir API is specified to take file names in the system codepage (*not* in UTF-8), there is no way to handle file names that aren't expressable in the system codepage through the GDir API even if GLib would use the wide-char API. See also bug #101792. It might be a good idea to add g_dir_open_utf8(), g_open_utf8(), g_stat_utf8(), g_rename_utf8() etc API to GLib. I wonder whether it is too late to get that in in 2.6...
I thought we settled how this should work previously ... that we ignore codepage encodings and make the filename encoding the UTF-16 filename converted to UTF-8, and then add a few wrappers for C library functions that take char * filenames.
I believe that if you set your browser encoding to utf-8, you should see the japanese text as it is (in my previous comment). However, now you tell me that the system codepage is used to read filenames (which makes it vastly unuseful for many tasks then) so clearly you won't be able to reproduce. Did you have a look at the stack trace i posted? As for the issue you presented about local codepage encoding, wouldn't it be clever to just define utf-8 as the encoding for all filenames in the system and to convert at the edge (just before opening the file you use a tempstring to hold the w version of the string in WinXP/2000/NT and the a version of the string under older OSes)... I am not sure what I say is practical but now that Linux does support utf8 encoding everywhere, Microsoft supports it since NT 3.51, gtk is just unusable on modern OSes for this reason! I was waiting for that crash to be fixed before starting building a new application, but it seems I will have to do it in wxpython which has correct support for unicode stuff (since it uses the native widgets...). Very unfortunate.
Guillaume: please read te thread from http://mail.gnome.org/archives/gtk-devel-list/2003-October/msg00058.html for background.
Owen, Thank you for the link. I see that we want to keep compatibility for Windows 9x users, but as it hinders the rest of the world (and particularly most people in Asia (check the cc: on this bug to see that a lot of fellow Chinese users are in the list of people who can't use gtk on windows). Microsoft was shipping a compatibility dll for previous versions of Windows (all but the very first version of Win95) so we should be able to use the W apis for most of our needs. So I agree with you Owen in your previous post that to make the life of Windows users better, we would definitely love some g_ encapsulation of the common libc functions for portability reasons. People using gtk exclusively on win32 could get away by doing their own utf8->widechar conversion before passing it to the underlying libraries but with the current situation where the files are represented in the current codepage I just can't propose my organization to use gtk for their portable endeavour but wx instead (and I would hate to do that). Tor, I can help with debugging that and move to utf8 based filenames but I have to warn you that I would need somebody to give me a clear explanation on how to set up a development machine for gtk on windows. Is there such a document lying somewhere?
An additional hint... When looking at the fileinfo I see that it uses a utf8 string for the displayname. However, what was in my memory DID NOT LOOK like a utf8 string so I tried to go and see what could have gone wrong. I stumbled on g_path_get_basename() (used in get_file_info()) which seems to do "char" based character search on a multibyte character string... A no-go. Can somebody else confirm that?
> I see that we want to keep compatibility for Windows 9x users, > but as it hinders the rest of the world (and particularly > most people in Asia You misunderstand. Even if GLib and GTK uses the non-wide-char API on Windows, it should work fine on multi-byte codepages, if it wasn't for this bug here. (GIMP 1.2 on Windows, which uses a much older GTK version, apparently is/was quite popular in Japan, for instance.) As soon as this bug this report is talking about has been fixed, GTK should work fine with multi-byte codepages. (You won't be able to, for instance, access files with names in, for instance, Greek or Arabic, on a Japanes machine, though.) Is the bug in the new GTK file chooser, or does it occur also in the old file selector, BTW? The "component" field says GtkFileSel, but the traceback in Comment #18 indicates GtkFileChooser. Or do both cause a crash? As to making GLib (and GTK) use the wide-char API on NT-based Windows, you can expect a patch suggestion from me later today, or tomorrow.
WinGimp 2.0.1 crashed when opening the file selector in the same place where the file chooser crashed (launched from my gtk python script) Gimp 1.2 IS very popular in Japan (you probably know about the Magazine Windows 100% who gives a lesson in Gimping every month). However, they advise their readers not to upgrade to 2.0 because lots of crashes when opening/saving. I was actually surprised than not a lot more people entered a bug here. The kind of applications my company want to ask me to write needs support for unicode filename encoding (basically at least Chinese, Japanese and Korean supported together). I did not misunderstand, I just think that I don't think anybody nowadays in Asia believe that the "ansi" way (multibyte) is good enough for professional apps. Regards. I am looking forward for your patch suggestion. Guillaume
After renaming the directories in "crashgtk" unzipped from the zipfile you attached as you wrote in Comment #21 (yes, setting IE:s text encoding to UTF-8 did make them show up correctly, and somewhat (positively) surprising cut-and- paste from IE to Explorer then did work), I still couldn't get any crash. The file selector in GIMP 2.0.x just showed question marks in the file names (as was expected), and GTK+ 2.4.4's testfilechooser didn't show the Japanese names at all. So I am rather sure this bug occurs only on multibyte codepage systems. I guess g_utf8_collate_key() needs to be looked at a bit closer, as it is the one that calls g_malloc() with a ridiculously large argument in your backtrace. g_utf8_collate_key() has two #ifdef branches, presumably the STDC_ISO_10646 branch is used Linux and has been more thoroughtly tested.
> I stumbled on g_path_get_basename() (used in get_file_info()) > which seems to do "char" based character search on a multibyte > character string... A no-go. Hmm, do the East Asian codepages have multibyte sequences with ASCII slashes or backslashes in them? (Yeah, I do have the CJKV book, but I'm too lazy to check this out myself now... you probably know the answer right away.) If not, I think the code should work anyway? If yes, you are correct, and that code needs to be fixed.
I just attached patches that implement the suggested UTF-8 wrappers to bug #101792. On Win32, they use the wide-char API when present. On Unix, they just are wrappers that call g_filename_from_utf8() and call the non-UTF-8 functions. If that gets in GLib, GtkFileChooser presumably should be changed to use the UTF-8 versions. Still, this bug here needs to be found and fixed, of course, we can't ask people to just wait for GTK+ 2.6.
Hi Tor, I don't know as much as I would want about DBCS (I am a natural Unicode child). However, a bit of googling turned up: (this is a google cache) http://64.233.167.104/search?q=cache:dQhsAp_hDAsJ:www.thecodeproject.com/string/cppstringguide1.asp+multibyte+sequence+with+slash+backslash+in+them&hl=ja and then searching further I found the following: http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT If you look at the column on the left and search in it for "5C" (the backslash (but actually the yen sign in Japan... this is all very messy)). You will see that there are a lot of potential matches the most painful being the fullwidth katakana letter "so" 0x835c which is a very common letter. So definitely doing byte-comparison on a multibyte stream is "verboten"!
OK, thanks for the links. There are *lots* of SJIS double-byte characters where the second byte is an ASCII slash or backslash! Will have to fix those places in GLib that don't handle double-byte strings properly... I am not 100% sure that those problematic places is the cause of the file chooser and/or selector problem descibed by this bug report, though. Could well be, of course. But maybe best to open a new bug for it until one knows for sure?
What kind of (legacy, non-UTF-8) multi-byte encoding do Unix systems in East Asia use, BTW? Presumably they on purpose use such encodings that never have ASCII slashes in the second byte? (Otherwise all kinds of things would break, the Unix kernels don't know anything about charsets.)
This is a good question. Japan used EUC-JP. I imagine Taiwan used BIG5. China used various combinations of GBK. And I imagine the number of possible issues arising from every possible DBCS encoding in use in the world is quite high. Interestingly as Windows uses the backslash (0x5c) and Unix uses the solidus (0x2F) the issues are not the same. More interestingly, SJIS does not collide with the forward SLASH! So if Microsoft had been smart enough to steal UNIX conventions, we would'nt have this bug.
> SJIS does not collide with the forward SLASH Hmm, you are correct, I had looked at the wrong column when I thought I saw 2F in there.
Hmm, EUC is extended _Unix_ codeset, isn't it ? So it should come as a big surprise that is avoids slashes...
If we handle win32 encoding as I proposed in bug 101792, we should never have to do string operations on SJIS filenames.
Maybe Tor is trying to keep some binary compatibility with the programs that ALREADY directly use fopen on filenames nowadays. If you move all that to utf8 without leaving a compatibility layer, you will anger a lot of people.
Hello, I met the same crash on Windows 2000, but it's the new GtkFileChooser, not the old one. I'm not sure if it's the same problem. The VC debugger shows the call stack: ... MSVCRT! 7800bec3() MSVCRT! 78006955() LIBGLIB-2.0-0! 002d046a() LIBGLIB-2.0-0! 002cdbed() LIBGLIB-2.0-0! 002fb054() gtk_file_info_get_display_key(const _GtkFileInfo * 0x00db7f50) line 137 + 8 bytes name_sort_func(_GtkTreeModel * 0x101793f5, _GtkTreeIter * 0x00db7f88, _GtkTreeIter * 0x0012dcbc, void * 0x0012dcac) line 3842 + 14 bytes gtk_tree_model_sort_compare_func(const void * 0x002d38ec, const void * 0x00d949e0, void * 0x00d94950) line 1557 LIBGLIB-2.0-0! 002d38ec() Debugger shows the call of g_utf8_collate_key (info->display_name, -1); in gtk_file_info_get_display_key(), causes crash. info->display_name is the filename in utf-8. The string caused the crash is a Chinese filename: { 230, 150, 176, 229, 187, 186 }, which seems a valid utf-8 string. The version of GTK+ is 2.4.7, compiled by myself (for debug info), GLib is 2.4.5, download from http://www.gimp.org/win32. Windows 2000 Professional Chinese Edition.
Ohmigod. Look at g_unicode_collate(), in the !__STDC_ISO_10636 ifdef branch: gchar *str_locale = g_convert (str_norm, -1, "UTF-8", charset, NULL, NULL, NULL); str_norm is in UTF-8, and the code attempts to convert it to the "charset" character set. But the source and destination character set parameters are in the wrong order. This presumably causes all kinds of interesting stuff to happen. It also explains bug #150394. As such it probably is not enough to directly cause the crash, though. That we presumably can thank Microsoft's implementation of strxfrm() for, which returns INT_MAX on errors, causing g_malloc to be called with INT_MAX+2 == 2147483649.
That particular line is in g_unicode_collate_key() (which the backtrace in this bug indicated), but g_unicode_collate() has two similar lines with the same error.
Created attachment 30766 [details] [review] Suggested patch I still can't reproduce the crash, but anyway, it probably is a good idea to guard against bogus return values from strxfrm(). Deciding what return values are bogus is of course debatable. Negative or >= INT_MAX-2 sure are.
Great! After applied the patch gtk_file_selection works fine on my Simplified Chinese windows XP now!
The key to reproduce this crash seems to be: Control Panel->Regional Options->General->Your locale (location) It crashes only with locale Chinese (PRC), and works just fine with English (United States). In this way I can reproduce it even on an English Edition of Win2000.
Sorry, not even that way can I reproduce a crash. At least not with testgtk's file selector, or testfilechooser, browsing a directory with the file names given in comment #21 above. (English edition of Win2k here, too.) (I would have thought that the crash is related to what the system codepage is. Changing the locale in the Regional Options doesn't change the system codepage. I think I have read that the system codepage is "hardcoded" for each Windows edition, or at least changing it would require a reboot.)
Patch applied to HEAD and glib-2-4. Together with the fix for bug #150394, this bug probably is fixed, resolving. Will open separate bug reports for the cases in GLib and GTK where strchr() or strrchr() are used to search for backslashes in strings that represent filenames in the system codepage. (One should use _mbschr() or _mbsrchr() instead on Windows in these cases.)