After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 572952 - Add g_utf8_simple_casefold() for simple case folding (instead of full case folding)
Add g_utf8_simple_casefold() for simple case folding (instead of full case fo...
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: i18n
2.37.x
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 703165
 
 
Reported: 2009-02-24 08:51 UTC by Sebastien Bacher
Modified: 2018-05-24 11:46 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test g_utf8_casefold() (268 bytes, text/plain)
2013-07-03 10:02 UTC, Sébastien Wilmet
Details

Description Sebastien Bacher 2009-02-24 08:51:10 UTC
the bug has been opened on https://bugs.launchpad.net/bugs/332321

"Version of Package: 2.24.2-0ubuntu1
I expect: When I search and replace "ß" (to change to ß for html) I want all "ss" left unchanged.
What happens: "ss" is also replace as if it is an "ß""
Comment 1 André Klapper 2012-07-30 15:35:41 UTC
Still valid in 3.2
Comment 2 Paolo Borelli 2012-07-30 15:52:56 UTC
Search and replace is done in Gtk itself
Comment 3 Sébastien Wilmet 2013-06-27 13:43:38 UTC
*** Bug 703165 has been marked as a duplicate of this bug. ***
Comment 4 Sébastien Wilmet 2013-07-03 10:02:52 UTC
Created attachment 248295 [details]
Test g_utf8_casefold()
Comment 5 Sébastien Wilmet 2013-07-03 10:06:44 UTC
The problem comes from g_utf8_casefold().

It transforms both ß (lowercase eszett) and ẞ (uppercase eszett) as "ss".

It should instead leave ß as-is, and transform ẞ (uppercase eszett) as ß (lowercase eszett).
Comment 6 Sébastien Wilmet 2013-07-03 10:28:42 UTC
From http://www.unicode.org/Public/5.1.0/ucd/UCD.html
and http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt

We can see those lines:
> 00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;German;;;
> 1E9E;LATIN CAPITAL LETTER SHARP S;Lu;0;L;;;;;N;;;;00DF;

I think it means what I said above: ß -> ß and ẞ -> ß.

So the root of the problem is that gunichartables.h is not up-to-date, and should be regenerated with the latest Unicode spec.
Comment 7 Sébastien Wilmet 2013-07-03 10:30:18 UTC
From http://www.unicode.org/Public/5.1.0/ucd/UCD.html
and http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt

We can see those lines:
> 00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;German;;;
> 1E9E;LATIN CAPITAL LETTER SHARP S;Lu;0;L;;;;;N;;;;00DF;

I think it means what I said above: ß -> ß and ẞ -> ß.

So the root of the problem is that gunichartables.h is not up-to-date, and should be regenerated with the latest Unicode spec.
Comment 8 Sébastien Wilmet 2013-07-03 10:52:15 UTC
gen-unicode-tables.pl in glib takes this file instead (for the case folding):

http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt

We can see these lines:
> 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
> 1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
> 1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S

It seems that glib takes the full case folding (letter F) instead of the simple case folding (letter S).

0073 is the letter 's'.

So the desired behavior for gtk_text_iter_forward_search() (with the case insensitive flag) is the simple case folding, not the full case folding.
Comment 9 GNOME Infrastructure Team 2018-05-24 11:46:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/199.