GNOME Bugzilla – Bug 350610
Unicode Bidirectional types and functions
Last modified: 2008-11-29 00:02:46 UTC
Currently if I want to find the direction type of a gunichar or the base direction of a string I have to link Pango. This would be a good candidate for inclusion in GLib along with the other gunichar functions. Specifically: enum GUnicodeDirection {...} (from PangoDirection) GUnicodeDirection g_unichar_direction (gunichar c) (from pango_unichar_direction) GUnicodeDirection g_utf8_base_direction (const gchar *str, gssize len) (from pango_find_base_dir) Also useful might be an implementation of the Bidirectional Algorithm.
Out of curiousity, what are you planning to use the bidirectional algorithm for?
Well, actually I wasn't. It just seemed like a good idea, seeing as it's specified in Unicode but non-trivial to implement correctly. The basic directionality stuff, though, is your fault, actually :). I read Planet Gnome in Liferea, and because you sign your name in Arabic script it messes up the directionality of the item list (<name>: <item>); because your name's at the front the base direction is detected as RTL. Obviously to fix this you have to detect the directionality of the overall feed itself (from the title, say); which means linking Pango, which seems unfair given that most other Unicode data is in GLib. Actually, that's another thing - Liferea now overrides item directionality by inserting a LRM or RLM at the start of the text; would it make sense to ask for a widget property to override text base direction? I'm thinking of four options: GTK_WIDGET_TEXT_BASE_DIRECTION_DEFAULT /* default: use direction from text */ GTK_WIDGET_TEXT_BASE_DIRECTION_WIDGET /* as gtk_widget_get_direction()/gtk_widget_get_default_direction() */ GTK_WIDGET_TEXT_BASE_DIRECTION_LTR /* left-to-right */ GTK_WIDGET_TEXT_BASE_DIRECTION_RTL /* right-to-left */ This would be fed through to Pango via pango_layout_set_auto_dir() and pango_context_set_base_dir(). Should I file an enhancement bug on this?
We have bugs open for adding Pango markup/attributes that allow things like that.
Ah: bug 70399 and bug 168108. Thanks.
The IDNA algorithms impose certain restrictions on the bidi properties of characters in internationalized domain names, to avoid a case where two distinct hostnames would both display the same because of directionality issues. (Eg, www.אבג123.com and www.123אבג.com, where the first is "aleph bet gimel 1 2 3" and the second is "1 2 3 aleph bet gimel".) Enforcing the restriction is more important for nameserver implementations than it is for clients (since if a client looks up an invalid name, it will just get a "not found"), but the specs say that clients are supposed to do the checks as well anyway. (Security reasons?) So this is something that would require access to the bidi properties. (But not the whole bidi algorithm.) The current rule (from RFC 3454) is that if any character in a segment of a domain name has bidi character type R or AL, then the segment must start and end with an R or AL character, and cannot contain any L characters. However, this rule doesn't work with some languages and is being updated (http://tools.ietf.org/wg/idnabis/draft-ietf-idnabis-bidi/). The currently-proposed new rule makes use of even more distinct bidi types than the old one, so this would definitely require more than just the current PangoDirection values.
Pango now exports the bidi type. And there's of course always GNU FriBidi. I'd rather not add these to glib right now.