GNOME Bugzilla – Bug 416390
changecase doesn't change some characters
Last modified: 2014-09-25 17:04:06 UTC
If a selection has a character that doesn't have an uppercase variant like ẗ <U+1e97>, any change of case with the changecase plugin wipes out the whole selection. A fix for ẗ that could probably be generalized for characters without precomposed case variants, decompose the character, change case, and recompose.
could you provide a patch?
Actually, decomposing/composing would not help for ß. In any case g_unichar_to_upper () chokes on ß and ẗ instead of returning the characters themselves. g_utf8_strup and g_utf8_strdown should be used, they seem to handle these special cases well.
Renaming the bug since glib's function return what they should, strings are not wiped out anymore. However only characters with single character case variants will be modified. So ß and ẗ are still not changed as expected. Changecase is currently character based, would it be OK to work on strings if it was per word? That would work with g_utf8_strup() and g_utf8_strdown(). Something like g_utf8_strtitle() has it's place in glib where Unicode data already is.
Yes working on strings is the proper solution... we need to experiment a bit on what is the best lenght of the string to use with regard to performance. I am not sure if 'word' is the best size (since it varies and we can have corner cases of very long words). Maybe MIN (wordlen, N), but at that point I am not sure it's worth the effort and maybe we should just use N. Obviously N should not be in bytes, but in number of characers. About g_utf8_strtitle() I am not sure it belongs in glib, since titlecase is not well defined: we just capitalize the first letter of each word, but sometimes people would like it to follow localized capitalization rules (e.g Phantom of the Opera vs Phantom Of The Opera, Murders in the Rue Morgue vs Murders In The Rue Morgue, etc)
Paolo (Maggi), btw, since these methods turn out to be not as trivial as they may first look and since they are something that many editors should have, what do you thing of moving them to methods of gtksourcebuffer?
Confirming on gedit 3.4.1, ẗ is still not uppercased.
Probably doing the conversion line by line will give better results.
Fixed: https://git.gnome.org/browse/gtksourceview/commit/?id=fd545bd9e501df65e9c67c5d679d2b0c5fc5344f Finally it was not done line by line, since it's not suitable for toggle-case and title-case.