Bug 170414 – Optimizing the opentype handling code

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 170414 - Optimizing the opentype handling code


Summary:	Optimizing the opentype handling code


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.9.0
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2005-03-15 04:37 UTC by Behdad Esfahbod
Modified:	2006-03-11 21:57 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Main patch, using bsearch in pango/opentype/ftxopen.c (9.32 KB, patch) 2005-03-15 04:39 UTC, Behdad Esfahbod	none	Details \| Review
Make pangoft2topgm suitable for benchmarking (7.68 KB, patch) 2005-03-15 04:41 UTC, Behdad Esfahbod	none	Details \| Review
Test case used (10.98 KB, text/plain) 2005-03-15 04:43 UTC, Behdad Esfahbod		Details
Misc optimizations (3.58 KB, patch) 2005-03-15 04:48 UTC, Behdad Esfahbod	committed	Details \| Review
Main patch revisited, Include stdlib if using bsearch. (9.52 KB, patch) 2005-03-15 06:47 UTC, Behdad Esfahbod	none	Details \| Review
Using call tables for lookup dispatch in gsub and gpos code (18.41 KB, patch) 2005-03-15 09:39 UTC, Behdad Esfahbod	committed	Details \| Review
Updated pangoft2topgm benchmarking patch (7.76 KB, patch) 2005-03-15 09:45 UTC, Behdad Esfahbod	committed	Details \| Review

Description Behdad Esfahbod 2005-03-15 04:37:02 UTC

Version details: HEAD

I was feeling like optimizing pango, so profiled using oprofile and found out
that binary searches in ftxopen are taking more than 18% running time of my
benchmark. Replaced them with bsearch and it was reduced to 8%.  The actual
running time was reduced from 1m33s to 1m21s.

The benchmark is a modification of pangoft2topgm that generates the bitmap 1000
times, and doesn't produce any output.

I used the font Nazli available from
http://www.farsiweb.info/wiki/Products/PersianFonts,  which is a light-weight
Persian font.  Using Nazli, freetype takes about 8% of the running time, while
using the huge Tahoma, it will take around 60%!

The text rendered in the benchmark is my Persian translation of a short story by
Richard Brautigan, called "What Are You Going to Do with 390 Photographs of
Christmas Trees?", available at  http://www.riza.com/richard/christmas.shtml.

Comment 1 Behdad Esfahbod 2005-03-15 04:39:29 UTC

Created attachment 38728 [details] [review]
Main patch, using bsearch in pango/opentype/ftxopen.c

This is the main patch.  Adds check to configure for bsearch, and #ifdefly uses
that in three places in ftxopen functions Get_Class and Coverage_Index.

Comment 2 Behdad Esfahbod 2005-03-15 04:41:46 UTC

Created attachment 38729 [details] [review]
Make pangoft2topgm suitable for benchmarking

This adds support to pangoft2topgm for not generating any output if the output
filename is an empty string, and rendering text more than one time, using
--runs option.	Should be pretty harmless to apply, but eases benchmarking.

Comment 3 Behdad Esfahbod 2005-03-15 04:43:31 UTC

Created attachment 38730 [details]
Test case used

This is the Persian translation used as test case.  Would be good candidate to
be included under examples/ as a long text using Arabic script.

Comment 4 Behdad Esfahbod 2005-03-15 04:46:12 UTC

To profile, download and install the Nazli font from URL in comment 1, download
the test case and put it under examples/ in pango tree, use the following script
in examples/:

#!/bin/bash

sudo opcontrol --start-daemon
sudo opcontrol --reset
sudo opcontrol --start
time ./pangoft2topgm --font Nazli --hinting=none --width=720
christmas-trees-persian.utf8 --output "" --runs 1000
sudo opcontrol --stop
../libtool --mode=execute opannotate --source ./pangoft2topgm > annot
../libtool --mode=execute opreport --symbols ./pangoft2topgm > symbs

Comment 5 Behdad Esfahbod 2005-03-15 04:48:06 UTC

Created attachment 38731 [details] [review]
Misc optimizations

Finally, here goes some useless optimizations and cleaning up, just in case
Owen is in the mood of applying useless optimizations today :).  Sorry for
multiple spamming all.

Comment 6 Behdad Esfahbod 2005-03-15 06:26:25 UTC

Humm, G_UNLIKELY can be used in the compare functions, but I didn't do that
right now, since opentype code is not including glib at the moment.  I would be
more than happy to go ahead and fix that.

Comment 7 Behdad Esfahbod 2005-03-15 06:47:36 UTC

Created attachment 38732 [details] [review]
Main patch revisited, Include stdlib if using bsearch.

Comment 8 Behdad Esfahbod 2005-03-15 09:39:01 UTC

Created attachment 38735 [details] [review]
Using call tables for lookup dispatch in gsub and gpos code

This patch replaces the switches in Do_Glyph_Lookup by a call table, in both
gpos and gsub code.  Saves a couple more percents and basically prevents
Do_Glyph_Lookup from showing up in the profiling output.

Comment 9 Behdad Esfahbod 2005-03-15 09:45:00 UTC

Created attachment 38736 [details] [review]
Updated pangoft2topgm benchmarking patch

Turn compiler warning off.

Comment 10 Owen Taylor 2005-06-17 16:09:30 UTC

Are you sure that you aren't just pushing CPU time around with the
bsearch patch... moving it from the Pango library to libc? While bsearch
is likely more optimized than the hand-coded bsearches, my expectation
is that it's not *a lot* better, and there is considerably more function
call overhead using system bsearch.

What are the overall run time changes from your various patches? 
(using 'time', say)

Comment 11 Behdad Esfahbod 2005-06-19 00:34:32 UTC

About pushing CPU time around, no, since I see bsearch showing up in the symbols
with a reasonable (7%) share after the patch.  About not being *a lot* better,
you are definitely right.  That's my expectation too.  With HEAD, I switched to
'init 1' and made more accurate measurements.  Both scenarios are run 10 times,
dropped the way-off results, and the rest were pretty stable.  The average
presented here:

Without patches:
runs: 200, font: Nazli, text: christmas-trees-persian.utf8
16.58user 0.45system 0:17.04elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+50481minor)pagefaults 0swaps

With patches:
runs: 200, font: Nazli, text: christmas-trees-persian.utf8
15.27user 0.41system 0:15.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+50495minor)pagefaults 0swaps


Shows about 8% net speedup in this benchmark.

Comment 12 Behdad Esfahbod 2005-10-26 06:46:53 UTC

I'm marking the bsearch patch obsolete, since now that I'm profiling with -O3
(and another benchmark, federico's), it's making things slower.  The other
patches save a few percents though.

Comment 13 Behdad Esfahbod 2005-11-03 20:14:49 UTC

2005-11-03  Behdad Esfahbod  <behdad@gnome.org>

        Patches from #170414.  Reviewed by Matthias Clasen.

        * pango/opentype/ftxgpos.c, pango/opentype/ftxgsub.c: Use call table
        to dispatch different lookup types.

        * pango/opentype/pango-ot-buffer.c, pango/opentype/pango-ot-ruleset.c:
        Small cleanup.

Comment 14 Behdad Esfahbod 2005-11-03 20:38:28 UTC

2005-11-03  Behdad Esfahbod  <behdad@gnome.org>

        * examples/pangoft2pgm.c, renderdemo.c, renderdemo.h: Added a --runs
        options, useful for profiling.  Misc cleanup, freeing memory. (from
        #170414)