GNOME Bugzilla – Bug 694534
change to unichar_index() fixes assertion failure with U+1D173
Last modified: 2013-02-24 23:43:01 UTC
The punchline is that I think unichar_index() in "basic-win32.c" should increment its counter on a low surrogate, and not on a high surrogate: for (i = 0; i < ix; i++) /* Ignore the high surrogate */ if (!(wtext[i] >= 0xD800 && wtext[i] < 0xDBFF)) index++; This change should only matter when the supplied index `ix' is in the middle of a surrogate pair. It makes the function return the index of the character represented by the surrogate pair instead of the character after. The change is specifically intended to address a crash when using the character U+1D173. We ran into the problem in the context of Racket, but pasting a U+1D173 into GIMP crashes with the same assertion failure: at g_assert (glyphs->log_clusters[glyphix] < n_chars) in convert_log_clusters_to_byte_offsets(). The problem is that U+1D173 is complex in the sense of ScriptIsComplex(), and so uniscribe_shape() is used, which calls set_up_pango_log_clusters(). The `itemlen' for that call is 2, since the character is represented by two UTF-16 code units (i.e., a surrogate pair), and so the log-cluster array has two elements. Here's the weird part: each half of the surrogate has been mapped to itself by the ScriptShape() function, instead of both mapping to the beginning of the surrogate pair; that does not make sense to me, and I thin kit may be a problem with ScriptShape(). In any case, when unichar_index() is called with offset 1 for the second of the pair, it reports a 1 back, which would mean a second character --- but there is only one character overall. Assuming that the middle-of-surrogate problem isn't due to a misuse of ScriptShape() somehow, then having unichar_index() effectively be more defensive seems like a good general approach. Related: I think the third argument to the call to ScriptItemize() in itemize_shape_and_place() should be G_N_ELEMENTS (items)-1 --- one less than the size of `items'.
The "< 0xDBFF" in the suggested fix for unichar_index() should be "<= 0xDBFF".
This is a duplicate of Bug 668154. The patch here is a somewhat different approach than the patch attached to that ticket, although I think that both will work. This ends up allocating less memory, but I'm not familiar enough with the intent of the current code to know which is a better solution.
I'm not certain, but I don't think the patch with Bug 668154 is right. I'll comment there.
Ok, lets mark dup and discuss in the other bug. *** This bug has been marked as a duplicate of bug 668154 ***