After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 710932 - g_str_tokenize_and_fold() does not play well with characters that can't be ascii-ified
g_str_tokenize_and_fold() does not play well with characters that can't be as...
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2013-10-26 17:01 UTC by Xavier Claessens
Modified: 2018-05-24 15:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Change implementation and API of g_str_tokenize_and_fold() (14.79 KB, patch)
2013-10-26 22:27 UTC, Xavier Claessens
none Details | Review

Description Xavier Claessens 2013-10-26 17:01:18 UTC
When I do g_str_tokenize_and_fold("礼思朱", NULL, &alternates); then alternates is ["", NULL]. The empty string is because it can't ascii-ify any of the string characters. I think if at least one char of a word can't be ascii-ified then the whole word should be omitted from alternates.
Comment 1 Xavier Claessens 2013-10-26 17:21:11 UTC
Reading the implementation, I think it deserve a bit more useful comments:

1) /* TODO: Invent time machine.  Converse with Mustafa Ataturk... */ --> that does not help understanding what the code does... Why are those 2 unicodes special? If they are really special cases, I think it deserve a specific unit test.

2) g_new0 (gchar *, 0 + 1); and g_new (gchar *, n + 1); --> It is gchar ** actually. It know the sizeof is the same in any case, but that would help understanding that we are building strv IMO.

3) ~decomposed[k] & 0x80 --> where does that magical number comes from? I guess it means "decomposed[k] is an ascii" but I think it should be told in a comment.
Comment 2 Xavier Claessens 2013-10-26 22:27:09 UTC
Created attachment 258191 [details] [review]
Change implementation and API of g_str_tokenize_and_fold()

The new code is what Nokia N900 and Empathy are using. It has
been wildly tested.

It avoid iterating over the string multiple times. It change
the assumption that searching for "fré" does not want to match
"fre", which is I think better. If I search for "fré" in my
contacts I want "frederic@example.com" to match.
Comment 3 GNOME Infrastructure Team 2018-05-24 15:45:57 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/768.