GNOME Bugzilla – Bug 140537
Pasting non-ascii character fails
Last modified: 2011-02-04 16:16:42 UTC
Pasting text with non-ascii characters fails when the selection request is for TARGET_STRING or target utf8. In the case of a utf8 request, it fails because utf8 isn't a recognized target in gdk_selection_convert. In the case of a TARGET_STRING request, the wchar_t string from Windows is converted to utf8 in gdk_selection_convert and then the utf8 string is interpreted as a latin1 string in gdk_text_property_to_utf8_list_for_display. The attached patch adds utf8 to the list of recognized targets in gdk_selection_convert, but doesn't do anything to fix TARGET_STRING.
Created attachment 26832 [details] [review] patch for gdk_selection_convert to recognize utf8 as a target
Would you have a sample program for both cases? Or instructions how to exhibit the problem with testgtk?
To see the problem in testgtk, first comment out the if (!result) clause in gtkclipboard.c as done in the following diff. This eliminates the fallback from utf8 -> compound text -> string targets. --- external/gtk2/gtk+/gtk/gtkclipboard.c 28 Mar 2003 23:54:24 -0000 1.1 +++ external/gtk2/gtk+/gtk/gtkclipboard.c 29 Apr 2004 18:35:00 -0000 @@ -770,6 +770,7 @@ request_text_received_func (GtkClipboard result = gtk_selection_data_get_text (selection_data); +#if 0 if (!result) { /* If we asked for UTF8 and didn't get it, try compound_text; @@ -791,6 +792,7 @@ request_text_received_func (GtkClipboard return; } } +#endif // 0 info->callback (clipboard, result, info->user_data); g_free (info); Then run testgtk or testtext, select some non-ascii characters using the character map accessory or with some other program, and paste into a TextView with ^v. Nothing appears without the patch. If the initial target is then set to string as with the following diff, the utf8 is treated as a latin1 string and gibberish is added to the buffer. @@ -828,7 +830,7 @@ gtk_clipboard_request_text (GtkClipboard info->callback = callback; info->user_data = user_data; - gtk_clipboard_request_contents (clipboard, gdk_atom_intern ("UTF8_STRING", FALSE), + gtk_clipboard_request_contents (clipboard, GDK_TARGET_STRING, request_text_received_func, info); }
About TARGET_STRING, is it so that applications (and/or gtk) really use it and do need it to work, and that means that the data is in the locale's charset? The code in gdk/win32/gdkselection-win32.c and gdkproperties-win32.c is presently somewhat confused about this. It would be so much cleaner if I could just assume that UTF-8 is all that is used. But if necessary, it can be fixed, I am looking at it now. What about COMPOUND_TEXT, do applications and/or gtk ever peek into such values? If not, I assume I can just punt and handle it like UTF8_STRING? (Emulating the real X11 COMPOUND_TEXT format would be rather tedious.)
Created attachment 27747 [details] [review] Tentative patch I went through the mess, and hopefully now it is in better shape. Seems to work on Windows 2000. Somebody else will have to test on Win9x. (There are significat differences. There is no CF_UNICODETEXT clipboard format on Win9x.)
Looks good, though I still need to test on win98. Could the support for COMPOUND_TEXT be dropped rather than just substituting utf8? It seems that this would be a bit more honest than relying on apps to treat it opaquely and to pass the data to other gdk functions for decoding.
It works on win98 -- copying text that I'm pretty sure is not representable in any installed locale (U+04E8, cyrillic capital letter barred O, on a machine with only US English support installed) was pasted successfully into notepad. Copying the same character from notepad into the gtk based app (Wing IDE), resulted in the U+04E8 being replaced with ? as expected.
<owen> tml: I can't conceive of any GTK+ apps that only support if [COMPOUND_TEXT]. Just support STRING (which is latin-1, not encoding of ocale) and UTF-8 and you should be good to go. <tml> owen: ok, will drop it then. good riddance ;-) ok, will fix STRING to be latin-1
Created attachment 28617 [details] [review] Revised patch: STRING is always ISO-8859-1, don't even pretend any COMPOUND_TEXT support
Mass changing gtk+ bugs with target milestone of 2.4.2 to target 2.4.4, as Matthias said he was trying to do himself on IRC and was asking for help with. If you see this message, it means I was successful at fixing the borken-ness in bugzilla :) Sorry for the spam; just query on this message and delete all emails you get with this message, since there will probably be a lot.
Finally applied the patch to HEAD and gtk-2-4.