GNOME Bugzilla – Bug 138180
add new OtherLetters types for G_UNICODE_OTHER_LETTERS
Last modified: 2004-12-22 21:47:04 UTC
I am not sure whether my request does make sense or not. When do wordbreak, pango will treat as one word all adjacent characters whose types are one among G_UNICODE_LOWERCASE_LETTER, G_UNICODE_MODIFIER_LETTER, G_UNICODE_TITLECASE_LETTER, G_UNICODE_UPPERCASE_LETTER and G_UNICODE_OTHER_LETTER, so typically in chinese locale, any string that consist of only ascii alpha character, 'a-z' and 'A-Z', and chinese ideographic character would be treated as one word. Maybe it makes sense to seperate them to have adjacent chinese character as one word and adjacent ascii character as another word.
Created attachment 25966 [details] [review] one patch to define new wordtype, OtherLetters
Created attachment 25967 [details] with the patch, only adjacent chinese ideograph characters are treated as one word
You’re conflicting with unicode again. :) pango_break() needs a rewrite anyway, see bug 97545. Don't know if the unicode standard agrees with your idea or not.
Once we actually have a TR #29 conformant-implementation, we may need to consider whether we, in practice, want different rules. But I don't want to look at deviating *before* we actually conform. *** This bug has been marked as a duplicate of 97545 ***
Created attachment 26046 [details] by looking up one dictionary, the word, 'world' in chinese, has been highlighted correctly
it's good to follow the new revision of TR #29 conformant-implemntation and hope we wouldn't wait too long. but i am afraid that it still wouldn't satisfy our need. In chinese environment, dictionary should be used to do wordbreaking, see the previous attachment for reference. If rewrite, please take it into account by defining generic interface to look up any dictionary , if any.