GNOME Bugzilla – Bug 59320
rewrite of mozilla_locale_to_unicode, mozilla_unicode_to_locale
Last modified: 2004-12-22 21:47:04 UTC
In mozilla_locale_to_unicode() and mozilla_unicode_to_locale() in mozilla.cpp, wcstombs() and mbstowcs() is used for encoding conversion. These assume that wchar_t == unicode. On other than linux platform, wchar_t is not always in unicode. So, we cannot use these functions to convert unicode strings to/from locale strings. So, we must use some encoding conversion functions like iconv(3). But, iconv(3) may introduce another problem. The problem that may be occur is the difference of encoding conversion table. Mozilla uses its own encoding conversion functions, and iconv(3) in glibc (or libiconv, etc.) use anoter conversion tables. Most part of these tables are same, but some portions of tables are different. For example, Mozilla converts 0xA1C1 in EUC-JP (== 0x2141 in JIS X 0208) to U+FF5E (unicode). And iconv(3) in glibc 2.2.4 converts U+FF5E in Unicode to SS2 0xA2B7 in EUC-JP (0x2237 in JIS X 0212). So, tiles of some pages cannot correctly rendered with galeon, and bookmark of such pages will corrupt. So, nsIUnicodeEncoder/Decoder in Mozilla will appropirate for thsese purpose. I've rewrite mozilla_locale_to_unicode() / mozilla_unicode_to_locale() in mozilla.cpp.
Created attachment 925 [details] [review] patch to mozilla.cpp
heh, funnily enough, we originally used to use the mozilla encoders and decoders. :-) Can you please reformat in unified diff and I'll need more convincing to let a goto in like that. :-)
Created attachment 926 [details] [review] unified diff version
Ok, still, I'm uncomfortable with the goto thing. Can you restructure it without using gotos? That said, I do prefer the mozilla based implementation and I want this in eventually.
Created attachment 978 [details] [review] no 'goto' version.
I've digged the mozilla source code and found 6 Japanese characters that may cause bookmark/history/title bar etc. breakage. The characters are 0x2141, 0x2142, 0x215D, 0x2171, 0x224C (in JIS X 0208). Mozilla converts them into unicode U+FF5E, U+2225, U+FF0D, U+FFE0, U+FFE2, respectively. And iconv(3) of glibc 2.2.4 cannot convert correctly these unicode characters into JIS X 0208 charcters. glibc's iconv(3) converts original JIS X 0208 characters into Unicode U+301C, U+2016, U+2212, U+00A2, U+00AC, respectively. This glibc behavior is based on the mapping table distributed from unicode.org. This mapping table is now obsoleted. See http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/ You can also find another information about ambiguities in conversion at http://www.w3.org/TR/2000/NOTE-japanese-xml-20000414/
I've checked out latest CVS galeon and using it. The patch commited seems to be working good. So I close this bug. Thanks.
CVS galeon now start to hold strings of bookmarks in UTF-8. By this change, bookmark/history corruption appear again. So, I write 2 functions, mozilla_utf8_to_locale() and mozilla_locale_to_utf8(), and replace some utf8_to_locale() and locale_to_utf8() with them. (If rewrite all locale_to_utf8()/utf8_to_locale() with new functions, galeon cannot run correctly. I've not yet examine why this happens. May be XPCOM related problems.)
Created attachment 5060 [details] [review] replace utf8_to_locale()/locale_to_utf8() with new functions
I'm almost certain that the problem here is because mozilla is not initialised when the bookmarks are first loaded, we'd have to change the init order for this to work.
looks like something we should get right before 1.0
I moved the xpcom startup, can you check if now your patch works ? Thanks
I've just checked out new CVS galeon and applied my patch and it seems to work fine. Also I replaced more locale_to_utf8() and utf8_to_locale() with new functions and compiled. This also works fine. I attach new version of this patch.
Created attachment 5586 [details] [review] New version of locale <-> utf8 patch
committed, thank you a lot !
I'm reopening this bug until we fix the problems that some people has with bookmarks.
*** Bug 60813 has been marked as a duplicate of this bug. ***
Created attachment 5647 [details] [review] Fix bookmark search related bugs.
I think, boomark search related bug was fixed. So close this.