After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 694534 - change to unichar_index() fixes assertion failure with U+1D173
change to unichar_index() fixes assertion failure with U+1D173
Status: RESOLVED DUPLICATE of bug 668154
Product: pango
Classification: Platform
Component: win32
unspecified
Other Windows
: Normal normal
: ---
Assigned To: gtk-win32 maintainers
pango-maint
Depends on:
Blocks:
 
 
Reported: 2013-02-23 15:32 UTC by Matthew Flatt
Modified: 2013-02-24 23:43 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Matthew Flatt 2013-02-23 15:32:04 UTC
The punchline is that I think unichar_index() in "basic-win32.c" should
increment its counter on a low surrogate, and not on a high surrogate:

  for (i = 0; i < ix; i++)
    /* Ignore the high surrogate */
    if (!(wtext[i] >= 0xD800 && wtext[i] < 0xDBFF))
      index++;

This change should only matter when the supplied index
`ix' is in the middle of a surrogate pair. It makes the
function return the index of the character represented
by the surrogate pair instead of the character after.

The change is specifically intended to address a crash when using
the character U+1D173. We ran into the problem in the context of
Racket, but pasting a U+1D173 into GIMP crashes with the same
assertion failure: at g_assert (glyphs->log_clusters[glyphix] < n_chars)
in convert_log_clusters_to_byte_offsets().

The problem is that U+1D173 is complex in the sense of
ScriptIsComplex(), and so uniscribe_shape() is used,
which calls set_up_pango_log_clusters(). The `itemlen'
for that call is 2, since the character is represented by
two UTF-16 code units (i.e., a surrogate pair), and so
the log-cluster array has two elements. Here's the weird
part: each half of the surrogate has been mapped to itself
by the ScriptShape() function, instead of both mapping to
the beginning of the surrogate pair; that does not
make sense to me, and I thin kit may be a problem with
ScriptShape(). In any case, when unichar_index() is called
with offset 1 for the second of the pair, it reports a 1 back,
which would mean a second character --- but there is only
one character overall. 

Assuming that the middle-of-surrogate problem isn't
due to a misuse of ScriptShape() somehow, then having
unichar_index() effectively be more defensive seems like
a good general approach.


Related: I think the third argument to the
call to ScriptItemize() in itemize_shape_and_place()
should be G_N_ELEMENTS (items)-1 --- one less
than the size of `items'.
Comment 1 Matthew Flatt 2013-02-24 00:00:29 UTC
The "< 0xDBFF" in the suggested fix for unichar_index()
should be "<= 0xDBFF".
Comment 2 Daniel Atallah 2013-02-24 15:15:45 UTC
This is a duplicate of Bug 668154.

The patch here is a somewhat different approach than the patch attached to that ticket, although I think that both will work.

This ends up allocating less memory, but I'm not familiar enough with the intent of the current code to know which is a better solution.
Comment 3 Matthew Flatt 2013-02-24 15:23:38 UTC
I'm not certain, but I don't think the patch with Bug 668154 is right. I'll comment there.
Comment 4 Behdad Esfahbod 2013-02-24 23:43:01 UTC
Ok, lets mark dup and discuss in the other bug.

*** This bug has been marked as a duplicate of bug 668154 ***