After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 138180 - add new OtherLetters types for G_UNICODE_OTHER_LETTERS
add new OtherLetters types for G_UNICODE_OTHER_LETTERS
Status: RESOLVED DUPLICATE of bug 97545
Product: pango
Classification: Platform
Component: general
1.4.x
Other Linux
: Normal normal
: ---
Assigned To: pango-maint
pango-maint
Depends on:
Blocks:
 
 
Reported: 2004-03-26 09:06 UTC by federic zhang
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
one patch to define new wordtype, OtherLetters (2.37 KB, patch)
2004-03-26 09:23 UTC, federic zhang
none Details | Review
with the patch, only adjacent chinese ideograph characters are treated as one word (22.17 KB, image/png)
2004-03-26 09:25 UTC, federic zhang
  Details
by looking up one dictionary, the word, 'world' in chinese, has been highlighted correctly (25.93 KB, image/png)
2004-03-29 07:47 UTC, federic zhang
  Details

Description federic zhang 2004-03-26 09:06:45 UTC
I am not sure whether my request does make sense or not. When do wordbreak,
pango will treat as one word all adjacent characters whose types are one among
G_UNICODE_LOWERCASE_LETTER,
G_UNICODE_MODIFIER_LETTER, G_UNICODE_TITLECASE_LETTER,
G_UNICODE_UPPERCASE_LETTER and G_UNICODE_OTHER_LETTER, so typically in chinese
locale, any string that consist of only ascii alpha character, 'a-z' and 'A-Z',
and chinese ideographic character would be treated as one word. Maybe it makes
sense to seperate them to have adjacent chinese character as one word and
adjacent ascii character as another word.
Comment 1 federic zhang 2004-03-26 09:23:33 UTC
Created attachment 25966 [details] [review]
one patch to define new wordtype, OtherLetters
Comment 2 federic zhang 2004-03-26 09:25:51 UTC
Created attachment 25967 [details]
with the patch, only adjacent chinese ideograph characters are treated as one word
Comment 3 Noah Levitt 2004-03-26 14:23:50 UTC
You’re conflicting with unicode again. :) pango_break() needs a rewrite anyway,
see bug 97545. Don't know if the unicode standard agrees with your idea or not.
Comment 4 Owen Taylor 2004-03-26 14:27:02 UTC
Once we actually have a TR #29 conformant-implementation, we may need
to consider whether we, in practice, want different rules. But I don't
want to look at deviating *before* we actually conform.


*** This bug has been marked as a duplicate of 97545 ***
Comment 5 federic zhang 2004-03-29 07:47:27 UTC
Created attachment 26046 [details]
by looking up one dictionary, the word, 'world' in chinese, has been highlighted correctly
Comment 6 federic zhang 2004-03-29 07:48:41 UTC
it's good to follow the new revision of TR #29 conformant-implemntation and hope
we wouldn't wait too long. but i am afraid that it still wouldn't satisfy our
need. In chinese environment, dictionary should be used to do wordbreaking, see
the previous attachment for reference. If rewrite, please take it into account
by defining generic interface to look up any dictionary , if any.