GNOME Bugzilla – Bug 170414
Optimizing the opentype handling code
Last modified: 2006-03-11 21:57:07 UTC
Version details: HEAD I was feeling like optimizing pango, so profiled using oprofile and found out that binary searches in ftxopen are taking more than 18% running time of my benchmark. Replaced them with bsearch and it was reduced to 8%. The actual running time was reduced from 1m33s to 1m21s. The benchmark is a modification of pangoft2topgm that generates the bitmap 1000 times, and doesn't produce any output. I used the font Nazli available from http://www.farsiweb.info/wiki/Products/PersianFonts, which is a light-weight Persian font. Using Nazli, freetype takes about 8% of the running time, while using the huge Tahoma, it will take around 60%! The text rendered in the benchmark is my Persian translation of a short story by Richard Brautigan, called "What Are You Going to Do with 390 Photographs of Christmas Trees?", available at http://www.riza.com/richard/christmas.shtml.
Created attachment 38728 [details] [review] Main patch, using bsearch in pango/opentype/ftxopen.c This is the main patch. Adds check to configure for bsearch, and #ifdefly uses that in three places in ftxopen functions Get_Class and Coverage_Index.
Created attachment 38729 [details] [review] Make pangoft2topgm suitable for benchmarking This adds support to pangoft2topgm for not generating any output if the output filename is an empty string, and rendering text more than one time, using --runs option. Should be pretty harmless to apply, but eases benchmarking.
Created attachment 38730 [details] Test case used This is the Persian translation used as test case. Would be good candidate to be included under examples/ as a long text using Arabic script.
To profile, download and install the Nazli font from URL in comment 1, download the test case and put it under examples/ in pango tree, use the following script in examples/: #!/bin/bash sudo opcontrol --start-daemon sudo opcontrol --reset sudo opcontrol --start time ./pangoft2topgm --font Nazli --hinting=none --width=720 christmas-trees-persian.utf8 --output "" --runs 1000 sudo opcontrol --stop ../libtool --mode=execute opannotate --source ./pangoft2topgm > annot ../libtool --mode=execute opreport --symbols ./pangoft2topgm > symbs
Created attachment 38731 [details] [review] Misc optimizations Finally, here goes some useless optimizations and cleaning up, just in case Owen is in the mood of applying useless optimizations today :). Sorry for multiple spamming all.
Humm, G_UNLIKELY can be used in the compare functions, but I didn't do that right now, since opentype code is not including glib at the moment. I would be more than happy to go ahead and fix that.
Created attachment 38732 [details] [review] Main patch revisited, Include stdlib if using bsearch.
Created attachment 38735 [details] [review] Using call tables for lookup dispatch in gsub and gpos code This patch replaces the switches in Do_Glyph_Lookup by a call table, in both gpos and gsub code. Saves a couple more percents and basically prevents Do_Glyph_Lookup from showing up in the profiling output.
Created attachment 38736 [details] [review] Updated pangoft2topgm benchmarking patch Turn compiler warning off.
Are you sure that you aren't just pushing CPU time around with the bsearch patch... moving it from the Pango library to libc? While bsearch is likely more optimized than the hand-coded bsearches, my expectation is that it's not *a lot* better, and there is considerably more function call overhead using system bsearch. What are the overall run time changes from your various patches? (using 'time', say)
About pushing CPU time around, no, since I see bsearch showing up in the symbols with a reasonable (7%) share after the patch. About not being *a lot* better, you are definitely right. That's my expectation too. With HEAD, I switched to 'init 1' and made more accurate measurements. Both scenarios are run 10 times, dropped the way-off results, and the rest were pretty stable. The average presented here: Without patches: runs: 200, font: Nazli, text: christmas-trees-persian.utf8 16.58user 0.45system 0:17.04elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+50481minor)pagefaults 0swaps With patches: runs: 200, font: Nazli, text: christmas-trees-persian.utf8 15.27user 0.41system 0:15.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+50495minor)pagefaults 0swaps Shows about 8% net speedup in this benchmark.
I'm marking the bsearch patch obsolete, since now that I'm profiling with -O3 (and another benchmark, federico's), it's making things slower. The other patches save a few percents though.
2005-11-03 Behdad Esfahbod <behdad@gnome.org> Patches from #170414. Reviewed by Matthias Clasen. * pango/opentype/ftxgpos.c, pango/opentype/ftxgsub.c: Use call table to dispatch different lookup types. * pango/opentype/pango-ot-buffer.c, pango/opentype/pango-ot-ruleset.c: Small cleanup.
2005-11-03 Behdad Esfahbod <behdad@gnome.org> * examples/pangoft2pgm.c, renderdemo.c, renderdemo.h: Added a --runs options, useful for profiling. Misc cleanup, freeing memory. (from #170414)