After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 313907 - Update break.c to handle new line-breaking types in Unicode 4.1
Update break.c to handle new line-breaking types in Unicode 4.1
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
1.10.x
Other All
: Normal enhancement
: ---
Assigned To: pango-maint
pango-maint
Depends on: 313583
Blocks:
 
 
Reported: 2005-08-19 00:54 UTC by Behdad Esfahbod
Modified: 2005-11-05 00:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
mentioned patch (13.20 KB, patch)
2005-08-19 00:56 UTC, Behdad Esfahbod
committed Details | Review

Description Behdad Esfahbod 2005-08-19 00:54:35 UTC
I'm attaching patch for break.c to handle the new (Conjoining Jamo handling)
line-breaking types in Unicode 4.1.  The logic is exactly the same in UAX#14
(Line Breaking) and UAX#29 (Text Boundaries), so I have used the same code for
both, which is neat.

I used the testing patch in bug #97545 (by Noah Levitt) to verify that the Jamo
handling in grapheme clusters is still functioning correctly, and after fixing
the bugs, it is.

When testing with the mentioned test, I also changed the '\n' that was being
added to the end of paragraphs to a PARAGRAPH_SEPARATOR, since '\n' plays tricks
if preceded by '\r'.

After these changes, it passes all tests in GraphemeClusterBreakTest.txt of
Unicode 4.1.  (I'm working on the rest too)
Comment 1 Behdad Esfahbod 2005-08-19 00:56:05 UTC
Created attachment 50962 [details] [review]
mentioned patch

This requires the Unicode 4.1 data in glib.
Comment 2 Behdad Esfahbod 2005-10-01 12:38:45 UTC
Ok, this can be applied now, after requiring glib 2.9.  Awaiting review.
Comment 3 Matthias Clasen 2005-11-04 17:03:13 UTC
You should probably branch pango before applying this, otherwise you'll bump
the glib requirement in the middle of a stable series, which should not happen.

I won't claim to understand all the break algorithm changes in the patch,
but it looks generally sane to me. One change which makes me wonder is 
the following one:

@@ -520,7 +606,7 @@
               /* This is how we fill in the last element (end position) of the
                * attr array - assume there's a newline off the end of @text.
                */
-              next_wc = '\n';
+	      next_wc = PARAGRAPH_SEPARATOR;
             }
           else
             {


Why is this ? It makes the preceding comment wrong, and it scares me a bit
if the rest of pango makes the assumption that there is a newline at the end...

I also noted that some comments in the patch refer to Unicode 4.2, you
probably want to make sure that the documentation refers to the right versions
(both in the comment, and also in the api docs).

Comment 4 Behdad Esfahbod 2005-11-04 20:14:55 UTC
Thanks Matthias.  We already have Pango 1.10 branched.  HEAD is 1.11 now.

About that change, as I wrote originally:  "When testing with the mentioned
test, I also changed the '\n' that was being added to the end of paragraphs to a
PARAGRAPH_SEPARATOR, since '\n' plays tricks
if preceded by '\r'."  The idea of adding '\n' is an internal implementation
detail of pango_default_break, to force a line break opportunity at the end of
string, but \n doesn't work if preceded by \r.  PARAGRAPH_SEPARATOR does.

I will check the docs, and apply.  Thanks.
Comment 5 Behdad Esfahbod 2005-11-05 00:40:39 UTC
2005-11-04  Behdad Esfahbod  <behdad@gnome.org>

        * pango/break.c: Update to handle new line-breaking types in the
        Unicode 4.1 UAX#14. (#313907)

        * configure.in: Bump required glib version to 2.9.0.  Needed for
        above-mentioned line-breaking types.