After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 484653 - imxim string conversion callback should pass wide-char surrounding
imxim string conversion callback should pass wide-char surrounding
Status: RESOLVED NOTGNOME
Product: gtk+
Classification: Platform
Component: Input Methods
2.12.x
Other Linux
: Normal normal
: ---
Assigned To: Hidetoshi Tajima
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2007-10-08 09:54 UTC by Theppitak Karoonboonyanan
Modified: 2009-06-05 13:28 UTC
See Also:
GNOME target: ---
GNOME version: 2.19/2.20


Attachments
patch passing surrounding text as wide-char (2.04 KB, patch)
2007-10-08 09:56 UTC, Theppitak Karoonboonyanan
none Details | Review
Patch using utf-16 for 2-byte wchar_t (2.42 KB, patch)
2007-10-10 02:43 UTC, Theppitak Karoonboonyanan
none Details | Review

Description Theppitak Karoonboonyanan 2007-10-08 09:54:22 UTC
String conversion callback in imxim (see Bug #101814 for the initial patch) currently passes multi-byte surrounding text back to XIM. This appears to be ambiguous, as it can be in any encoding. Unfortunately, Thai XIM in xorg assumes it to be TIS-620, but the g_locale_from_utf8() conversion can result in UTF-8 for UTF-8 locales. As a result, Thai XIM won't work properly with GTK+ apps on th_TH.UTF-8 locale, for example.

It should be safer to pass it as wide-char instead.
Comment 1 Theppitak Karoonboonyanan 2007-10-08 09:56:58 UTC
Created attachment 96868 [details] [review]
patch passing surrounding text as wide-char
Comment 2 Tor Lillqvist 2007-10-08 10:47:07 UTC
Does that patch assume that wchar_t is four bytes? Is that true for all compilers/platforms involved? (It isn't true on Windows, but that is of course not relevant for XIM stuff.)
Comment 3 Theppitak Karoonboonyanan 2007-10-08 11:07:58 UTC
(In reply to comment #2)
> Does that patch assume that wchar_t is four bytes?

I don't think so. For example, I've avoided using memcpy(), but use member-wise copy instead. And sizeof() is also used when calculating buffer size. The rests are based on X protocol headers.

Please correct me if I still miss some points.
Comment 4 Tor Lillqvist 2007-10-08 20:47:46 UTC
I think the code won't work properly for non-BMP characters in case wchar_t is two bytes (and wchar_t strings in XIM are supposed to be in UTF-16).

Are there such platforms/XIM implementations? If yes, your code probably needs some ifdefs: in case sizeof(wchar_t)==2 you should instead of calling g_utf8_to_ucs4() call g_utf8_to_utf16(), and "text" should then be wchar_t* instead of gunichar*.
Comment 5 Theppitak Karoonboonyanan 2007-10-09 04:40:11 UTC
AFAIK, Thai XIM in libX11 seems to be the only implementation that uses this callback. But sure there can be other XIM servers implemented as separate processes.

So, I now wonder that, as XIM server implementor, what one should expect the wide-char string from the client to be encoded in, between UCS4 and UTF-16. Note that the XIM server can be across the network. So, wchar_t can be different from what the client is running on.

For the particular case of Thai XIM, though, the server and client are fortunately in the same process. So, different wchar_t size is not a problem.
Comment 6 Theppitak Karoonboonyanan 2007-10-10 02:43:13 UTC
Created attachment 96976 [details] [review]
Patch using utf-16 for 2-byte wchar_t

Ignoring the protocol question, this should work for Thai XIM.
Comment 7 Theppitak Karoonboonyanan 2007-10-10 02:52:31 UTC
Altertatively, I have also filed a bug against xorg to make Thai XIM convert the multi-byte text based on locale before using it:

  https://bugs.freedesktop.org/show_bug.cgi?id=12759

(Probably, other toolkits than GTK+ may still pass multi-byte surrounding.)
Comment 8 Theppitak Karoonboonyanan 2009-06-05 13:28:32 UTC
Resolving this bug as NOTGNOME. Given that the XIM properly converts the multi-byte surrounding text based on current locale, this problem should be gone.