GNOME Bugzilla – Bug 141756
Find/replace doesn't work for some non-ascii chars
Last modified: 2004-12-22 21:47:04 UTC
1.Open a file containing some non-ascii characters such as „ („ in html, german opening quote). 2.Do a "find" for such a character. I'm doing this by selecting the character and then choosing Search->Find. 3.Notice that for some (not all) instances of the character, a later character (by 2 to 4 places) is selected. I have also noticed this with the ´ (′ in html) character. In this case, it is reliably that the prime character and also the following character are selected by the find. When the replace function is used, the erroneously selected char(s) are replaced. Example document at http://icculus.org/~jcspray/dr.html Screenshot illustrating bug at http://icculus.org/~jcspray/gedit_bug.png (note that 'n' is selected incorrectly instead of the low-quote).
I can reproduce it using the attached file. This is a gtksourceview bug. Gustavo: what do you think?
Created attachment 29655 [details] testcase
Damn find bugs ;-) Sorry for the delay. I'll take a look at this later today.
Ok, there is a difference in the way the prime ´ character is expanded during normalization and in decomposition. Normalization expands it to 2 UTF8 chars (3 bytes) while decomposition doesn't do anything to it. AFAICS, normalization treats the prime character as an accented space, and that's why it decomposes it. What I don't know is why decomposition does a different thing. Sigh. It can't be that hard to be UTF8 correct. I'll work out a patch later, replacing the calls to g_unicode_canonical_decomposition() with normalizations of 1-character strings.
I committed a fix for this. Can you guys please test it and if it works correctly close the bug? Thanks.
Great Gustavo! You patch seems to fix the problem for me. Closing.