After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 339805 - Find/Replace does not differentiate accented chars in with normal latin chars in it/es locales
Find/Replace does not differentiate accented chars in with normal latin chars...
Status: RESOLVED FIXED
Product: gtksourceview
Classification: Platform
Component: General
git master
Other All
: Normal normal
: ---
Assigned To: GTK Sourceview maintainers
GTK Sourceview maintainers
Depends on: 348754
Blocks:
 
 
Reported: 2006-04-26 12:21 UTC by Matt Keenan (IRC:MattMan)
Modified: 2007-09-04 15:02 UTC
See Also:
GNOME target: ---
GNOME version: 2.13/2.14


Attachments
gtksourceiter.c (17.92 KB, text/plain)
2007-01-10 21:40 UTC, Yevgen Muntyan
  Details
patch (1.72 KB, patch)
2007-01-10 21:43 UTC, Yevgen Muntyan
none Details | Review
patch (3.27 KB, patch)
2007-01-11 01:16 UTC, Yevgen Muntyan
accepted-commit_now Details | Review

Description Matt Keenan (IRC:MattMan) 2006-04-26 12:21:28 UTC
Please describe the problem:
Find/Replace in does not differenciate between normal latin characters and
accented ones.

Steps to reproduce:
1.Open a file containing localized characters in gedit application
2.Enter the following words 
- sí (LATIN CAPITAL LETTER I WITH ACUTE) – means “yes” in English language
- si (Note: I is without an accent mark) – means “if” in English language
3.Click on Replace
4.Replace dialog window is launched
5.Type the localized string si (Note: i without an accent - “if” in English
language) and click on Find button

Actual results:
he string sí (Note: i with an accent - “yes” in English language) is higlighted
in the document which is not the actual string to be searched.Further if a
Replace with string is given,the string  sí would also to be replaced with the
Replace with string given

Expected results:
Find/Replace should only highlight/replace specific characters that are searched
for.

Does this happen every time?
yes

Other information:
Comment 1 Paolo Maggi 2007-01-10 18:17:18 UTC
<behdad> paolo: ok, how about after the strncmp, checking that the next char in normalized_s1 is not a g_unichar_iszerowidth()?
<behdad> paolo: g_unichar_iszerowidth() is new in trunk.  though, it may match to some chars you don't want to.
* cworth has quit (bye)
<paolo> hmm... what do I obtain doing so?
<behdad> paolo: "si" will not match to "si followed by an accent"
<behdad> paolo: you just want the ISZEROWIDTHTYPE check from g_unichar_iszerowidth() btw.
<paolo> oh, since normalization split sì in si' or something like that
<behdad> yeah
<paolo> yep, it could be the problem
<paolo> behdad: thanks
* fer (~fherrera@a88-115-27-99.elisa-laajakaista.fi) has joined #gtk+
<paolo> you said I need only the "if (G_UNLIKELY (ISZEROWIDTHTYPE (c))) return TRUE;" part of the function
<paolo> right?
* iago has quit (bye!)
<behdad> paolo: yeah. or return FALSE, depending on what the return value means.
Comment 2 Yevgen Muntyan 2007-01-10 18:40:17 UTC
Interesting that pcre copes with this case correctly, doesn't match "sí" when looking for "si".
Comment 3 Yevgen Muntyan 2007-01-10 18:47:18 UTC
G_NORMALIZE_ALL_COMPOSE instead of G_NORMALIZE_ALL helps here too. Not in all cases perhaps.
Comment 4 Yevgen Muntyan 2007-01-10 21:27:32 UTC
<muntyan> behdad: UCD.html says "Changed general category of Zero Width Space (U+200B) from Zs to Cf.", so Zero Width Space falls into G_UNICODE_FORMAT?
<behdad> muntyan: yes
* bandini has quit (Ex-Chat)
* mmc (~ercmarusk@83-103-88-29.ip.fastwebnet.it) has joined #gtk+
<muntyan> behdad: but don't we want to ignore it when searching for text? i.e. to treat it not like accent mark
<muntyan> (ISZEROWIDTHTYPE includes G_UNICODE_FORMAT)
<behdad> muntyan: in that case, my fault.  just check for the _MARK types.
 behdad benzea 
<muntyan> behdad: ISMARK, right?
<behdad> muntyan: yeah, exactly.
Comment 5 Yevgen Muntyan 2007-01-10 21:40:37 UTC
Created attachment 79991 [details]
gtksourceiter.c

I've cooked this.
Comment 6 Yevgen Muntyan 2007-01-10 21:43:47 UTC
Created attachment 79992 [details] [review]
patch

Sorry, this is what I wanted to post.
Comment 7 Paolo Borelli 2007-01-11 00:24:28 UTC
It's late so I am prolly missing something obvious, but whar does this part of the patch has to do with the reset?

+#define g_utf8_strcasestr	gtk_source_strcasestr
+#define g_utf8_strrcasestr	gtk_source_strrcasestr
+#define g_utf8_caselessnmatch	gtk_source_caselessnmatch

The other part makes sense to me (as much as I understood what I behdad said), the only nitpick is that we usually do not use 'inline'[1]

1) I understand that it makes sense to inline the function since it's used only in that place, but as far as I know a) gcc will figure that out b) inline is not available on all the compilers we support (sun etc)
Comment 8 Yevgen Muntyan 2007-01-11 00:39:21 UTC
(In reply to comment #7)
> It's late so I am prolly missing something obvious, but whar does this part of
> the patch has to do with the reset?
> 
> +#define g_utf8_strcasestr      gtk_source_strcasestr
> +#define g_utf8_strrcasestr     gtk_source_strrcasestr
> +#define g_utf8_caselessnmatch  gtk_source_caselessnmatch

Um, didn't clean up the patch. That's what I have here to avoid name clash with glib.

> The other part makes sense to me (as much as I understood what I behdad said),
> the only nitpick is that we usually do not use 'inline'[1]

C++-ism, can't get rid of it. Totally agree it should not be there.
Comment 9 Yevgen Muntyan 2007-01-11 01:16:34 UTC
Created attachment 80004 [details] [review]
patch

Real thing now (not sure if it's nice though, as I said it's "what I cooked here").
Comment 10 Paolo Maggi 2007-01-11 08:54:04 UTC
Yevgen: thanks for the patch.

It probably solves the specific problem reported here, so it can go it as a first step.
I don't think it is generic enough to solve for example the problem of searching "s" in a text containing "ß".

May be Behdad as another great idea on how to solve this.

Please, commit the patch in both HEAD and latest branch.
Comment 11 Paolo Borelli 2007-01-11 08:59:14 UTC
/me puts on his pain-in-the-ass hat

1 - can you add a little comment above 

+	return type != G_UNICODE_NON_SPACING_MARK &&
+		type != G_UNICODE_ENCLOSING_MARK &&
+		type != G_UNICODE_NON_SPACING_MARK;

 saying what we are doing


2 - for the namespace clashing: what about gtk_source_utf8_strcasestr etc? (that si keep utf8 in the name)
Comment 12 Yevgen Muntyan 2007-02-10 16:07:42 UTC
Committed, finally. Anyway, what's the problem with searching "s" in a text containing "ß"? And what are the other problems of search? It always worked for me in Russian, so I assumed it's wokring fine :)
Comment 13 Yevgen Muntyan 2007-09-04 15:02:50 UTC
Didn't close it back then because I couldn't close it.