After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 313781 - Hebrew vowels rendered wrong because shaper font cache gets polluted
Hebrew vowels rendered wrong because shaper font cache gets polluted
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
1.9.x
Other All
: Normal normal
: ---
Assigned To: pango-maint
pango-maint
Depends on:
Blocks:
 
 
Reported: 2005-08-18 03:13 UTC by Stephen Blackheath
Modified: 2005-08-25 21:34 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Stephen Blackheath 2005-08-18 03:13:32 UTC
It is possible to make an application go into a permanent state where it renders
Hebrew incorrectly, through pollution of pango's 'shaper font cache'.

See test case (at the bottom).  The bug occurs on both pango-1.8.2 and
pango-1.9.1.

I've given the output of the test case below.  It seems my method of
writing out the character number is not quite right, but that doesn't
matter for this purpose.

Running it with the '1' argument shows what the output SHOULD look like.
  Running it with the '2' shows - at the bottom of the output - that
some of the letters are now being rendered using the "Basic" shape
engine.  This makes them get placed wrong when drawn.

The hebrew word is 1489 1468 1464 1512 1464 1443 1488.  The first,
middle (1512), and last character are all hebrew *letters*, while the
others are vowels and diacritical marks.  Pango considers the vowels and
diacritics to be "INHERIT" script type.

This test tricks pango into polluting its 'shaper font cache' (as
implemented by shaper_font_cache_get() in pango-context.c).

It works like this:

The string 32, 1468, 1464, 32, 1464, 1443, 32 contains no characters
that are of the HEBREW script type.  But, we tell it to use the Hebrew
language.  So, the vowels resolve to the "Basic" shape engine through
the inheritance rules. Their script type is considered to be perhaps
LATIN, but certainly not Hebrew.  But the "he" language setting causes
it to use the same cache as the good Hebrew text does.  It then pollutes
the cache by storing a mapping for the individual vowel characters to
the Basic shape engine instead of the Hebrew one.

From then on, Hebrew text containing the polluted vowels and diacritics
is rendered wrong.

It is difficult to avoid triggering this bug in a web browser, because
it throws all sorts of crud at the rendering engine.  I found the bug
through browser testing.


Steve

------ OUTPUT OF TEST CASE ------

aotearoa$ ./pango-test 1
Doing test 1 - Hebrew working properly

(pango-test:7215): Pango-WARNING **: Cannot open font file for font Ezra
SIL 12
1488 HebrewEngineFc
1443 HebrewEngineFc
1512 HebrewEngineFc
1512 HebrewEngineFc
1489 HebrewEngineFc
1489 HebrewEngineFc
1489 HebrewEngineFc
---
aotearoa$ ./pango-test 2
Doing test 2 - Hebrew getting corrupted through cache poisoning

(pango-test:7217): Pango-WARNING **: Cannot open font file for font Ezra
SIL 12
32 BasicEngineFc
32 BasicEngineFc
32 BasicEngineFc
32 BasicEngineFc
32 BasicEngineFc
1443 BasicEngineFc
32 BasicEngineFc
---
1488 HebrewEngineFc
1443 BasicEngineFc
1464 BasicEngineFc
1512 HebrewEngineFc
1468 BasicEngineFc
1468 BasicEngineFc
1489 HebrewEngineFc
---

pango-test.c
------------

#define PANGO_ENABLE_BACKEND
#define PANGO_ENABLE_ENGINE

#include <glib/gunicode.h>
#include <gdk/gdkpango.h>
#include <gdk/gdkrgb.h>
#include <pango/pango.h>
#include <string.h>
#include <stdio.h>

void dump(PangoContext* pc, gunichar2* text16, int length16)
{
    gchar* text8;
    PangoLayoutLine* line;
    PangoLayout *layout;
    GSList *tmpList;

    text8 = g_utf16_to_utf8(text16, length16, NULL, NULL, NULL);

    layout = pango_layout_new(pc);

    pango_layout_set_text(layout, text8, strlen(text8));
    line = pango_layout_get_line(layout, 0);

    for (tmpList = line->runs; tmpList && tmpList->data;
         tmpList = tmpList->next) {
        gint i;
        PangoLayoutRun *layoutRun = (PangoLayoutRun *)tmpList->data;

        for (i=0; i < layoutRun->glyphs->num_glyphs; i++) {
            gint thisOffset = (gint)layoutRun->glyphs->log_clusters[i] +
layoutRun->item->offset;
            printf("%d %s\n", g_utf8_get_char(text8+thisOffset),
               
G_OBJECT_CLASS_NAME(PANGO_ENGINE_SHAPE_GET_CLASS(layoutRun->item->analysis.shape_engine)));

        }
    }

    g_free(text8);
    printf("---\n");
}

int usage(char* argv0)
{
    fprintf(stderr, "Usage:\n");
    fprintf(stderr, "  %s 1  Show Hebrew working properly\n", argv0);
    fprintf(stderr, "  %s 2  Show Hebrew getting corrupted through cache
poisoning\n", argv0);
    return 1;
}

int main(int argc, char* argv[])
{
    PangoContext* pc;
    gint i;
    PangoFontDescription* fd;
    gint test_no;

    gtk_init(&argc, &argv);

    if (argc < 2)
        return usage(argv[0]);

    test_no = atoi(argv[1]);
    if (test_no < 1 || test_no > 2)
        return usage(argv[0]);

    printf("Doing test %d - %s\n", test_no,
      test_no == 1 ? "Hebrew working properly"
                   : "Hebrew getting corrupted through cache poisoning");

    pc = gdk_pango_context_get();
    pango_context_set_language(pc, pango_language_from_string("he"));

    if (test_no == 2) {
          /* Formatting this string causes pango-1.8.2 to subsequently render
          Hebrew text brokenly. */
        gunichar2 text16[] = {' ', 1468, 1464, ' ', 1464, 1443, ' '};
        dump(pc, text16, 7);
    }

    {
        gunichar2 text16[] = {1489, 1468, 1464, 1512, 1464, 1443, 1488};
        dump(pc, text16, 7);
    }
}

Makefile
--------

CFLAGS = $(shell pkg-config --cflags gtk+-2.0)
LDFLAGS = $(shell pkg-config --libs gtk+-2.0 pango)

all: pango-test
	$(CC) -o pango-test pango-test.c $(CFLAGS) $(LDFLAGS)

clean:
	rm -f pango-test
Comment 1 Stephen Blackheath 2005-08-25 21:34:55 UTC
I have re-tested on pango-1.10.0, and it is fixed.