Bug 104683 – Gnumeric's test-pango is slow

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 104683 - Gnumeric's test-pango is slow


Summary:	Gnumeric's test-pango is slow


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	Medium feature
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Duplicates:	158991 (view as bug list)
Depends on:
Blocks:

Reported:	2003-01-28 23:35 UTC by Morten Welinder
Modified:	2006-02-26 23:29 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Reduce pango cpu consumption by 70%+ (conservatively) (43.56 KB, patch) 2003-02-28 14:57 UTC, Morten Welinder	needs-work	Details \| Review
gnumeric-test-pango.sysprof.gz (190.38 KB, text/plain) 2005-11-08 20:29 UTC, Federico Mena Quintero		Details
gnumeric-test-pango.png (239.49 KB, image/png) 2005-11-08 20:35 UTC, Federico Mena Quintero		Details
gnumeric-test-pango-remote-x.sysprof.gz (249.53 KB, application/x-gzip) 2005-11-08 22:12 UTC, Federico Mena Quintero		Details
gnumeric-test-pango-remote-x.png (254.33 KB, image/png) 2005-11-08 22:14 UTC, Federico Mena Quintero		Details

Description Morten Welinder 2003-01-28 23:35:52 UTC

Currently, gnumeric uses a pangolayout per cell, say 400 on a screen.
It's painfully slow.

pango_layout_check_lines eats about 40% of the time to draw things.

The layouts typically contain short strings, have no width set, have
no white-space (and thus no newlines), and are all ascii.  It doesn't
get simpler.

Since pango_layout_set_text is already walking over the string, it could
very well notice this and somehow tell the very general machinery
that it might be worth it to take a shortcut somewhere.

Comment 1 Owen Taylor 2003-01-28 23:55:33 UTC

 - "Painfully slow" is a matter of opinion
 - It's normally that text layout takes as much time
   or more time than drawing.
 - pango_layout_check_lines() is the "do the paragraph
   layout" call. Everything except actual drawing
   happens in there.
 - I'd very much rather not special case, and I've seen
   no real evidence that special casing will help.

Comment 2 Jody Goldberg 2003-01-29 01:14:24 UTC

Gnumeric redraw is painfully slow.  Unusably so.  I'm getting alot of
complaints and we're attempting to do whatever we can.  Owen, we mean no insult
here, but something needs to be done if Gnumeric is to use pango.  Would you
accept a patch if we can find a short circuit ?

Comment 3 Owen Taylor 2003-01-29 14:58:54 UTC

"It's too slow" is simply not an acceptable bug report.
A bug report should point to a concrete action.

Without knowing what gnumeric is doing; without seeing
a profile of where it's taking the time, I can't comment
on it being too slow for you. In my experience, laying
out a screen full of text does not take significant
amounts of time with the current pango performance levels.
If you have to lay out the _entire_ worksheet before
displaying, then yes, that's probably going to be too
slow in some cases. But you'd have to make pango *much*
faster before that wasn't true. 

(Layout speed is ~100k character/sec on a 700mhz machine)

I very much doubt that useful short-circuiting is possible.
Short-circuiting means adding more layers of abstraction
and more complexity, and if the Pango code has a problem,
its too many layers of complexity and too much complexity.

There may be optimizations that can be done. I'm happy
to take bug reports about them -- e.g. bug 104495.
But a "it's too slow bug" does me no good.

Comment 4 Jody Goldberg 2003-01-29 15:25:04 UTC

We're only doing layout for the visible region.  On the order of 600
cells per area.  Doing the entire spreadsheet would be suicidal.
Our general approach seems sane.  We cache 1 pangolayout per cell, and generate
it the first time the cell is displayed.

This is not just pulling it out of the air.  Quantify is clearly pointing out
something in there.  Here's a cummulative break down of the redraw loop.

  83.17%   scg_make_cell_visible
  83.17%   gtk_layout_adjustment_changed
  83.17%   gtk_window_key_press_event
  53.90%   cell_draw			-- I expect this to be the bottleneck
  51.53%   gdk_draw_layout
  51.11%   gdk_draw_layout_with_colors
  46.10%   cell_render_value
  46.10%   row_calc_spans
  46.10%   rendered_value_new
  45.11%   pango_layout_get_extents_internal
  41.97%   pango_layout_get_extents
  40.86%   gdk_draw_layout_line_with_colors
  39.50%   pango_layout_get_pixel_size
  37.79%   pango_layout_check_lines	-- this is painful
  21.81%   pango_itemize
  18.46%   gdk_draw_glyphs
  18.08%   gdk_window_draw_glyphs
  16.83%   gdk_window_process_all_updates

Comment 5 Owen Taylor 2003-01-29 17:10:28 UTC

I can't spend more time closing this bug...

Performance improvement in Pango is always a good thing,
sure. If laying one screenful of text is hideously slow,
though, there is a gnumeric bug.

Comment 6 Morten Welinder 2003-02-09 23:50:40 UTC

> "It's too slow" is simply not an acceptable bug report.

Owen, you are acting overly defensive.

> A bug report should point to a concrete action.

What?  Rubbish.  A bug report should point out of concrete problem.
"Actions" are in the solution department, not in the reporting
department.

Owen, I can understand that you like to keep beautiful, general code.
Special-casing should not be done without cause.

I think your resistance is rooting in only testing with small cases.
Maybe only a few hundred layouts, on a local display, with state-
of-the-art cpu.  And I bet most of those layouts are static and don't
need to be redrawn in the common case.

Gnumeric has 400-600 layouts *per* *screen*.  Where you see a response
time of (say) 0.05s, we see (say) 0.2s, just because we have more
layouts.  Further, we get hit again after a scroll-down press.  That's
where "painful" comes in.  (Alternatively we could take the whole
hit during load of a spreadsheet.  That adds tens of seconds or
minutes to loading big sheets.)

Here'd s general optimization you might consider.  If the final
fragment fits completely, don't go around and look at breakpoints.
Cut-and-pasted patch below.  I have no idea how much it saves.

Index: pango/pango-layout.c
===================================================================
RCS file: /cvs/gnome/pango/pango/pango-layout.c,v
retrieving revision 1.97
diff -u -r1.97 pango-layout.c
--- pango/pango-layout.c	20 Dec 2002 17:18:51 -0000	1.97
+++ pango/pango-layout.c	5 Feb 2003 17:53:50 -0000
@@ -2734,7 +2734,8 @@
       switch (result)
 	{
 	case BREAK_ALL_FIT:
-	  if (can_break_in (layout, state->start_offset, old_num_chars,
first_item_in_line))
+	  if (state->items->next &&
+	      can_break_in (layout, state->start_offset, old_num_chars,
first_item_in_line))
 	    {
 	      have_break = TRUE;
 	      break_remaining_width = old_remaining_width;

Comment 7 Owen Taylor 2003-02-11 03:43:47 UTC

I think you are missing something here:

 pango_layout_check_lines

*IS* pango for all practical purposes.

It's the "Do the layout" function.

So, your bug is saying "Pango is slow", a bug that can 
never be closed because it will always be too slow for 
someone. It's not a sufficiciently detailed analysis of 
the problem.

(And for that reason, please file your last patch as
a separate bug, I'll just miss it on this bug.)

Comment 8 Morten Welinder 2003-02-28 14:57:51 UTC

Created attachment 14687 [details] [review]
Reduce pango cpu consumption by 70%+ (conservatively)

Comment 9 Morten Welinder 2003-02-28 15:01:22 UTC

With this patch, layouts that are very simple -- all ascii and no
whitespace -- become as fast as you would expect.  Notably that
means layouts containing numbers.

To activate, set PANGO_TRIVIAL=1.

(There are lots of other improvements in there that are not affected
by PANGO_TRIVIAL.)

Comment 10 Owen Taylor 2003-02-28 15:26:15 UTC

You obviously have spent a lot of time on this patch.
So, I feel bad at repeating what I said above:

 I very much doubt that useful short-circuiting is possible.
 Short-circuiting means adding more layers of abstraction
 and more complexity, and if the Pango code has a problem,
 its too many layers of complexity and too much complexity.

OK, perhaps you've proved that short-circuiting that 
improves performance is possible, but you've done
much violence to the Pango code base in the process.

(Also, if you are benchmarking with the pango-x backend,
that really is pretty irrelevant at this point... 
to respond to a question in some other bug report,
if you turn off antialiasing, the remote-X-server
performance of Xft without RENDER should be a lot
more competitive with the core font protocol.)

The assumption that "if text is ascii-only has no
spaces, there is no interaction between characters"
is not particular well justified either. It's very
likely that we'll support the TrueType kerning table
in basic-xft in the future, for example.

Comment 11 Morten Welinder 2003-02-28 15:47:33 UTC

The improvements in pangox are only a few percent, cpu-wise.

Why don't use start by cherry-picking from the patch?  There are
plenty of fixes and improvement that don't depent on PANGO_TRIVIAL
and thus doesn't change a pixel on the screen.

Comment 12 Owen Taylor 2003-02-28 16:10:39 UTC

My point about the X backend is that the primary virtue
of the trivial stuff seems to be avoiding the full 
shaper machinery, and the shaper machinery is entirely
different (and faster!) for the fontconfig based
backends.

Comment 13 Morten Welinder 2003-02-28 16:19:00 UTC

I get your point, but I am telling you that it is orthogonal to about
half the improvements in that patch.

Really, the caching of the first run's extents is a huge win because
the current code ends up calculating the same thing over and over again.
(And it's not as easy as it sounds to stop it from doing that.)

Likewise, glyphstrings are being realloc'ed umteen times because they
start out that size 1 -- that's totally pointless.

And pango_layout_set_text walks over the string a gazillion times.

And font descriptions get copied every single time a layout is laid
out.  (And they contain a string, so that's expensive.)

You really have to shake yourself out of that "pango is fast enough"
mindset.  It isn't, and there are lots of low hanging fruit not
related to short-circuiting the shaper.

Comment 14 Owen Taylor 2003-02-28 16:41:10 UTC

"Here's a big batch of changes to Pango, some of which 
 I've posted elsewhere, some not. You might want
 to go through and take some stuff from it."

"I don't any information about how much the individual
 changes speed up Pango, but trust me, it makes things
 faster"

Great. Thanks.

I've never seen any of the functions that you are optimizing
show up significantly on profiles, so frankly, I'm skeptical
that they are more than 1% or so speedups. Yes, 1% + 1% + 1%...
eventually gets you to 10%, but it's an excercise for
a rainy day, rather than anything that's going to solve
major performance problems.

I'll attach the benchmark that I used when testing
Soeren's approach of caching fontsets for the fontconfig
backend.

Some things like increasing the initial glyph string allocation
are in fact probably obvious; if there are separate bugs
for such issues, then I can actually close the bug when
they are applied.

Other changes in your patch are ???:

-  new_string->glyphs = g_memdup (string->glyphs,
-				 string->space * sizeof (PangoGlyphInfo));
-  new_string->log_clusters = g_memdup (string->log_clusters,
-				       string->space * sizeof (gint));
+  new_string->glyphs = g_new (PangoGlyphInfo, string->space);
+  memcpy (new_string->glyphs, string->glyphs,
+	  string->num_glyphs * sizeof (PangoGlyphInfo));
+
+  new_string->log_clusters = g_new (gint, string->space);
+  memcpy (new_string->log_clusters, string->log_clusters,
+	  string->num_glyphs * sizeof (gint));

Comment 15 Morten Welinder 2003-02-28 16:46:57 UTC

The latter is under the heading of general bug fixing.  You are
copying too much memory.  Besides being wasteful, it is undefined
behaviour.  (Spotted visually, not by profile.)

Comment 16 Morten Welinder 2003-02-28 18:33:26 UTC

Let me see...

Methodology: running gnumeric with a certain sheet that has =rand()
in A1:K3000.  After start, hit PageDown 16 times fast, wait for
redraw, then hit Ctrl-Q.  Timing only inside functions
rendered_value_new and cell_draw, thus identifying problem areas for
sheet browsing, an important area.  Gnumeric is being run remotely
and a fast LAN with PANGO_TRIVIAL=0 and GDK_USE_XFT=0.

Base run: pango + above patch + force all is_first_run to FALSE and
comment out ensure_first_run_rects call in pango_layout_get_iter.

New run: pango + above patch.

Thus we're timing only the effect of caching the first run's extents.

The tool in question is Rational Quantify 2003 with
-use-machine=UltraSparc:750MHz.  (That's not accurate, but good
enough.)  Timing of syscalls is OFF.  Since Gnumeric is a gui program,
it's very hard to create 100% exact identical runs, but the diffs
below show we are extremely close.

CONSLUSION: caching first run's extents alone speeds up the run by
11%.  (The 0.04s does not seem accurate, but that's what it says.)
Since this includes whatever time Gnumeric spends on its own behalf,
the pango speedup is somewhat higher.

Owen: all this just to show you that I am not doing voodoo
optimization.  Give me a little more trust that those "rainy day"
excuses above.

Differences between:
program /home/welinder/private/gnome/gnumeric/src/gnumeric (pid 26827)
collected on UltraSparc (750 MHz) by Quantify version 2003.06.00  and
program /home/welinder/private/gnome/gnumeric/src/gnumeric (pid 18090)
collected on UltraSparc (750 MHz) by Quantify version 2003.06.00 .
                   Function name         Calls     Cycles % change
!             pango_x_find_glyph        -253392   -7094976  52.38% faster
! pango_x_font_get_glyph_extents        -253392   -6841584  52.38% faster
! pango_glyph_string_extents_range      -31674   -6081408  63.17% faster
!           pango_x_get_per_char        -253392   -5067840  44.75% faster
!   pango_font_get_glyph_extents        -253392   -3040704  52.38% faster
!   pango_layout_run_get_extents        -31674   -1050296  74.96% faster
! unknown_static_function[libpangox-1.0.so.0/1] -506784   -1013568 
44.43% faster
!    Letext[libpango-1.0.so.0/5]        -253392    -506784  41.01% faster
! pango_layout_get_item_properties      -31674    -316740  60.55% faster
!     pango_glyph_string_extents        -31674    -190044  63.17% faster
!          pango_layout_get_iter             0    -122808  26.40% faster
! unknown_static_function[libpango-1.0.so.0/61] -31674     -63348 
52.38% faster
!                          frexp           -65      -4130   0.55% faster
!                          ldexp             0      -3055   0.81% faster
!                         _XSend            -5       -400  55.56% faster
!                       thr_main            -6        -72  25.00% faster
!                         writev            -5        -70  55.56% faster
!                       ___errno            -6        -54  25.00% faster
!                     _XFlushInt            -1        -50   6.67% faster
!                        _writev            -5        -25  55.56% faster
!                _X11TransWritev            -5        -20  55.56% faster
!          _X11TransSocketWritev            -5        -15  55.56% faster
!                          write            -1        -14   6.67% faster
!                  XDrawString16             0         -6   0.00% faster
!                         _write            -1         -5   6.67% faster
!                 _X11TransWrite            -1         -4   6.67% faster
!                        _XFlush            -1         -3   6.67% faster
!           _X11TransSocketWrite            -1         -3   6.67% faster
!            _XSetClipRectangles             0         22   0.00% slower
!                      cleanfree             1         39   0.00% slower
!                         memcpy             5        150   0.02% slower
!                           modf             0        585   0.12% slower
!                       realfree            32        608   0.01% slower
!                       t_delete            70       1647   0.09% slower
!               _malloc_unlocked             0       3652   0.04% slower
!                        t_splay           329      10818   0.62% slower
!                     update_run             0      89650  23.40% slower
!  pango_layout_line_get_extents             0     633960  49.61% slower
-        gtk_entry_motion_notify            -0         -0
39 differences; -30656895 cycles (-0.041 secs at 750 MHz)  11.25%
faster overall (including system calls).

Comment 17 Morten Welinder 2003-02-28 18:39:59 UTC

Re. filing separate bugs, what is that good for when your attitude
is so stronly against even thinking of the possiblity of performance
problems?

But hey, here are some old ones to start with.

http://bugzilla.gnome.org/show_bug.cgi?id=105500
http://bugzilla.gnome.org/show_bug.cgi?id=106892
http://bugzilla.gnome.org/show_bug.cgi?id=107138
http://bugzilla.gnome.org/show_bug.cgi?id=106925
http://bugzilla.gnome.org/show_bug.cgi?id=101997
http://bugzilla.gnome.org/show_bug.cgi?id=106973
http://bugzilla.gnome.org/show_bug.cgi?id=94718
http://bugzilla.gnome.org/show_bug.cgi?id=106942

Comment 18 Owen Taylor 2003-02-28 19:10:22 UTC

I don't know ... any benchmark that involves pressing keys, 
remote displays, network latency, has a definite tinge
of vodoo about it to me. My experience with profiling
tools is that they are useful, but seldom should be trusted
without a solid "does this optimization make the wall
clock time for this reproducible operation faster"

But anyways the reason I want separate patches is precisely
_because_ I don't necessarily want to context switch and
handle your patches right away. I'm not in patching Pango/ 
patching GTK+ mode right now ... I'm trying to get a start
on GTK+-2.4 API feature work.

About my "attitude" about performance problems, I'm perfectly
willing to accept that there may be performance improvements
possible, my objection to this bug is the blatant procedural
stupidity of such a bug report:

 - When are you going to close this bug? Should we title
   it "Pango too slow for Morten"? 

 - How the heck am I going to keep track whether fixes you
   propose in here are applied or not? It's a recipe
   for lost work.

Comment 19 Owen Taylor 2004-11-22 17:32:19 UTC

*** Bug 158991 has been marked as a duplicate of this bug. ***

Comment 20 Federico Mena Quintero 2005-11-08 20:25:35 UTC

Jody told me to run gnumeric/src/test-pango.  Here are the results as a sysprof
file, and the sysprof screenshot.

Comment 21 Federico Mena Quintero 2005-11-08 20:29:51 UTC

Created attachment 54487 [details]
gnumeric-test-pango.sysprof.gz

Comment 22 Federico Mena Quintero 2005-11-08 20:35:02 UTC

Created attachment 54490 [details]
gnumeric-test-pango.png

Note that [heap] and the X server in general take up more than 50% of the time.


The uppermost pango function in the profile is pango_renderer_draw_layout(), at
14.70% of the time.

So, the X server is the biggest culprit here.  It's probably the sucky
implementation of the RENDER extension.

I'd like to close this bug as INVALID or NOTGNOME, and have some more updated
bugs for Pango performance.  What do you think?

Comment 23 Ross Burton 2005-11-08 20:43:01 UTC

I'm having flashbacks... about 8 months ago I spend a fortnight optimising
Xft/RENDER/kdrive.  I wonder if Xorg has any of the performance problems kdrive
had. Has anyone profiled this with a modular X?

Comment 24 Morten Welinder 2005-11-08 21:05:17 UTC

Before you close it, could you try taking the X server out of the loop
by displaying remotely?

Comment 25 Federico Mena Quintero 2005-11-08 22:12:46 UTC

Created attachment 54506 [details]
gnumeric-test-pango-remote-x.sysprof.gz

Sysprof log from using a remote X server.

Comment 26 Federico Mena Quintero 2005-11-08 22:14:32 UTC

Created attachment 54507 [details]
gnumeric-test-pango-remote-x.png

Fix Cairo!

Comment 27 Federico Mena Quintero 2005-11-08 22:29:23 UTC

Since this has become a bug about Gnumeric's specific use of Pango, I'll retitle
the bug.

Comment 28 Federico Mena Quintero 2005-11-08 23:24:23 UTC

About comment #23 - Ross, do you have those patches around?  I'd love to test
them on my X server.

Comment 29 Ross Burton 2005-11-09 07:39:15 UTC

Federico: the Xft patches were committed, but the kdrive patches were rather
specific to kdrive so won't apply to xorg.  Basically the composite glyphs
implementation created a small picture for each glyph, composited those into
another picture, which was then composited onto the destination.  Mallum rewrote
this to composite directly onto the destination, leading to (IIRC) a 30% speed
increase.

Comment 30 Behdad Esfahbod 2006-02-26 23:29:51 UTC

Time to close this I believe...  Thanks to all for work on various improvements that started from this bug.