GNOME Bugzilla – Bug 423274
find/search doesn't normalize
Last modified: 2009-02-14 13:41:21 UTC
When searching for text in a file precomposed characters are not found/search as their equivalent characters in Unicode. Example: - string has "école" that's with <U+00E9 LATIN SMALL LETTER E WITH ACUTE>. - search for "école" with <U+0065 LATIN SMALL LETTER E;U+0301 COMBINING ACUTE ACCENT> The first string does not match the search but it should. g_utf8_normalize() should be used before comparing strings.
blocks meta bug 423036
Hi, I can try to fix this bug but I don´t Know very well how probe it. Could you give an example or a little po file in order to probe?
Created attachment 87570 [details] fr.po from gtk+, all strings are NFD Here's a copy of fr.po from gtk+, all strings are NFD. A simple example of a search can be a search for the "é" <U+00E9 LATIN SMALL LETTER E WITH ACUTE> which is not present since the its NFD is "é" <U+0065 LATIN SMALL LETTER E;U+0301 COMBINING ACUTE ACCENT>. Searching for on should match the other, both ways. Also, you'll notice there's a bug in the highlighting if you search for "é" <U+0065 LATIN SMALL LETTER E;U+0301 COMBINING ACUTE ACCENT>. For some reason the combining characters are incorrectly counted and highlighted matches are off. Should that be another bug report? Something else to digest: Should all input be NFC'ed, i.e. should all files written by gtranslator be in NFC (the recommended normalization form by the W3C)?
I agree with you, the highlighting doesn´t work correctly. If you want, you can report it as new bug and I will try to fix both.
Created attachment 88948 [details] This patch normalize the find function I think now the find is normalized but at the moment the highlighting is not fixed yet. Could you probe this patch? If it works well then I will try to repair the highlighting.
(In reply to comment #5) > Created an attachment (id=88948) [edit] > This patch normalize the find function > > I think now the find is normalized but at the moment the highlighting is not > fixed yet. Could you probe this patch? If it works well then I will try to > repair the highlighting. > I'mo not able to apply the patch to my copy of SVN. Is parse.c different?
(In reply to comment #6) > I'mo not able to apply the patch to my copy of SVN. > Is parse.c different? > This patch was generate from version 1_1_7 of SVN. I only modified the lines which are on patch.
OK. The search works for equivalent strings with the patch. But the patch uses G_NORMALIZE_DEFAULT which is NFD, so everything is decomposed. This also affects the saved files. This means all the current po files with precomposed characters will be saved with decomposed characters after the patch. I'd suggest using G_NORMALIZE_DEFAULT_COMPOSE to avoid the major change, as well as allowing legacy software to work with the saved files even if they don't support Unicode equivalences, like the current version.
Created attachment 89396 [details] [review] find and replace normalize I changed the normalization as I was suggested by Denis Jacquerye and I added the normalization to the replace function.
Thank you Pablo. This works fine. The only remaining bug is wrong match highlighting.
Applied path to dialogs.c on svn but i can't apply it in parse.c becouse there isn't append_line func.
(In reply to comment #11) > Applied path to dialogs.c on svn but i can't apply it in parse.c becouse there > isn't append_line func. > This path was done to stable version (1.1.7).
Created attachment 94241 [details] [review] This is a bit patch to trunk so that find/search is normalized. I changed the normalization mode in g_utf_normalize with respect to stable version because gettext's functions use G_NORMALIZE_DEFAULT instead G_NORMALIZE_DEFAULT_COMPOSE.
Applied.
Applied patch in branch gtranslator_1_1_8. I'm keeping this bug open because the highlighting doesn't work yet.
The 1.1.8 is not longer maintained. This is fixed in trunk version.