Bug 141756 – Find/replace doesn't work for some non-ascii chars

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 141756 - Find/replace doesn't work for some non-ascii chars


Summary:	Find/replace doesn't work for some non-ascii chars


Status:	RESOLVED FIXED

Product:	gtksourceview
Classification:	Platform
Component:	General
Version:	git master
Hardware:	Other Linux

Importance:	Urgent major
Target Milestone:	---
Assigned To:	GTK Sourceview maintainers
QA Contact:	GTK Sourceview maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2004-05-03 19:08 UTC by John Spray
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	2.7/2.8

Attachments
testcase (605 bytes, text/plain) 2004-07-19 16:38 UTC, Paolo Maggi	Details

Description John Spray 2004-05-03 19:08:15 UTC

1.Open a file containing some non-ascii characters such as „ (&bdquo; in html,
german opening quote).
2.Do a "find" for such a character.  I'm doing this by selecting the character
and then choosing Search->Find.
3.Notice that for some (not all) instances of the character, a later character
(by 2 to 4 places) is selected.

I have also noticed this with the ´ (&prime; in html) character.  In this case,
it is reliably that the prime character and also the following character are
selected by the find.

When the replace function is used, the erroneously selected char(s) are replaced.

Example document at http://icculus.org/~jcspray/dr.html

Screenshot illustrating bug at http://icculus.org/~jcspray/gedit_bug.png (note
that 'n' is selected incorrectly instead of the low-quote).

Comment 1 Paolo Maggi 2004-07-19 16:37:04 UTC

I can reproduce it using the attached file.

This is a gtksourceview bug.

Gustavo: what do you think?

Comment 2 Paolo Maggi 2004-07-19 16:38:18 UTC

Created attachment 29655 [details]
testcase

Comment 3 Gustavo Giráldez 2004-07-20 13:41:11 UTC

Damn find bugs ;-)  Sorry for the delay.  I'll take a look at this later today.

Comment 4 Gustavo Giráldez 2004-07-20 20:23:21 UTC

Ok, there is a difference in the way the prime ´ character is expanded during
normalization and in decomposition.  Normalization expands it to 2 UTF8 chars (3
bytes) while decomposition doesn't do anything to it.  AFAICS, normalization
treats the prime character as an accented space, and that's why it decomposes
it.  What I don't know is why decomposition does a different thing.  Sigh.

It can't be that hard to be UTF8 correct.  I'll work out a patch later,
replacing the calls to g_unicode_canonical_decomposition() with normalizations
of 1-character strings.

Comment 5 Gustavo Giráldez 2004-07-21 22:09:10 UTC

I committed a fix for this.  Can you guys please test it and if it works
correctly close the bug?  Thanks.

Comment 6 Paolo Maggi 2004-07-22 08:47:24 UTC

Great Gustavo!
You patch seems to fix the problem for me.

Closing.