After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 416390 - changecase doesn't change some characters
changecase doesn't change some characters
Status: RESOLVED FIXED
Product: gtksourceview
Classification: Platform
Component: General
unspecified
Other Linux
: Normal normal
: ---
Assigned To: GTK Sourceview maintainers
GTK Sourceview maintainers
Depends on:
Blocks:
 
 
Reported: 2007-03-09 10:34 UTC by Denis Jacquerye
Modified: 2014-09-25 17:04 UTC
See Also:
GNOME target: ---
GNOME version: 3.3/3.4



Description Denis Jacquerye 2007-03-09 10:34:31 UTC
If a selection has a character that doesn't have an uppercase variant like ẗ <U+1e97>, any change of case with the changecase plugin wipes out the whole selection.

A fix for ẗ that could probably be generalized for characters without precomposed case variants, decompose the character, change case, and recompose.
Comment 1 Paolo Borelli 2007-03-10 16:39:57 UTC
could you provide a patch?
Comment 2 Denis Jacquerye 2007-03-12 17:34:36 UTC
Actually, decomposing/composing would not help for ß.

In any case g_unichar_to_upper () chokes on ß and ẗ instead of returning the characters themselves.

g_utf8_strup and g_utf8_strdown should be used, they seem to handle these special cases well.
Comment 3 Denis Jacquerye 2007-03-23 02:10:33 UTC
Renaming the bug since glib's function return what they should, strings are not wiped out anymore. However only characters with single character case variants will be modified. So ß and ẗ are still not changed as expected.

Changecase is currently character based, would it be OK to work on strings if it was per word? 
That would work with g_utf8_strup() and g_utf8_strdown(). Something like g_utf8_strtitle() has it's place in glib where Unicode data already is.
Comment 4 Paolo Borelli 2007-03-23 08:35:23 UTC
Yes working on strings is the proper solution... we need to experiment a bit on what is the best lenght of the string to use with regard to performance.
I am not sure if 'word' is the best size (since it varies and we can have corner cases of very long words). Maybe MIN (wordlen, N), but at that point I am not sure it's worth the effort and maybe we should just use N. Obviously N should not be in bytes, but in number of characers.


About g_utf8_strtitle() I am not sure it belongs in glib, since titlecase is not well defined: we just capitalize the  first letter of each word, but sometimes people would like it to follow localized capitalization rules (e.g Phantom of the Opera vs Phantom Of The Opera, Murders in the Rue Morgue vs Murders In The Rue Morgue, etc)
Comment 5 Paolo Borelli 2007-03-23 08:37:26 UTC
Paolo (Maggi), btw, since these methods turn out to be not as trivial as they may first look and since they are something that many editors should have, what do you thing of moving them to methods of gtksourcebuffer?
Comment 6 Robert Roth 2012-07-28 12:30:09 UTC
Confirming on gedit 3.4.1,  ẗ is still not uppercased.
Comment 7 Sébastien Wilmet 2014-09-25 13:13:39 UTC
Probably doing the conversion line by line will give better results.
Comment 8 Sébastien Wilmet 2014-09-25 17:04:06 UTC
Fixed:
https://git.gnome.org/browse/gtksourceview/commit/?id=fd545bd9e501df65e9c67c5d679d2b0c5fc5344f

Finally it was not done line by line, since it's not suitable for toggle-case and title-case.