After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 141756 - Find/replace doesn't work for some non-ascii chars
Find/replace doesn't work for some non-ascii chars
Status: RESOLVED FIXED
Product: gtksourceview
Classification: Platform
Component: General
git master
Other Linux
: Urgent major
: ---
Assigned To: GTK Sourceview maintainers
GTK Sourceview maintainers
Depends on:
Blocks:
 
 
Reported: 2004-05-03 19:08 UTC by John Spray
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: 2.7/2.8


Attachments
testcase (605 bytes, text/plain)
2004-07-19 16:38 UTC, Paolo Maggi
Details

Description John Spray 2004-05-03 19:08:15 UTC
1.Open a file containing some non-ascii characters such as „ („ in html,
german opening quote).
2.Do a "find" for such a character.  I'm doing this by selecting the character
and then choosing Search->Find.
3.Notice that for some (not all) instances of the character, a later character
(by 2 to 4 places) is selected.

I have also noticed this with the ´ (′ in html) character.  In this case,
it is reliably that the prime character and also the following character are
selected by the find.

When the replace function is used, the erroneously selected char(s) are replaced.

Example document at http://icculus.org/~jcspray/dr.html

Screenshot illustrating bug at http://icculus.org/~jcspray/gedit_bug.png (note
that 'n' is selected incorrectly instead of the low-quote).
Comment 1 Paolo Maggi 2004-07-19 16:37:04 UTC
I can reproduce it using the attached file.

This is a gtksourceview bug.

Gustavo: what do you think?
Comment 2 Paolo Maggi 2004-07-19 16:38:18 UTC
Created attachment 29655 [details]
testcase
Comment 3 Gustavo Giráldez 2004-07-20 13:41:11 UTC
Damn find bugs ;-)  Sorry for the delay.  I'll take a look at this later today.
Comment 4 Gustavo Giráldez 2004-07-20 20:23:21 UTC
Ok, there is a difference in the way the prime ´ character is expanded during
normalization and in decomposition.  Normalization expands it to 2 UTF8 chars (3
bytes) while decomposition doesn't do anything to it.  AFAICS, normalization
treats the prime character as an accented space, and that's why it decomposes
it.  What I don't know is why decomposition does a different thing.  Sigh.

It can't be that hard to be UTF8 correct.  I'll work out a patch later,
replacing the calls to g_unicode_canonical_decomposition() with normalizations
of 1-character strings.
Comment 5 Gustavo Giráldez 2004-07-21 22:09:10 UTC
I committed a fix for this.  Can you guys please test it and if it works
correctly close the bug?  Thanks.
Comment 6 Paolo Maggi 2004-07-22 08:47:24 UTC
Great Gustavo!
You patch seems to fix the problem for me.

Closing.