After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 315365 - Improve layout performance by shortcircuiting fribidi on unidirectional text
Improve layout performance by shortcircuiting fribidi on unidirectional text
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: pango-maint
pango-maint
Depends on: 317192
Blocks:
 
 
Reported: 2005-09-06 14:43 UTC by Damon Chaplin
Modified: 2005-11-22 10:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to speedup layout (4.96 KB, patch)
2005-09-06 14:44 UTC, Damon Chaplin
none Details | Review
mini-fribidi shortcircuit patch (4.97 KB, patch)
2005-11-03 22:53 UTC, Behdad Esfahbod
none Details | Review
new patch for both ltr and rtl (7.94 KB, patch)
2005-11-09 00:27 UTC, Behdad Esfahbod
committed Details | Review

Description Damon Chaplin 2005-09-06 14:43:04 UTC
Here's a patch to improve the performance of PangoLayout by about 15%.
(It is noticeable if you resize a large document.)

It skips all the bidi code if it isn't needed.

I'm assuming that if there are no strong RTL characters we can assume the
entire string is level 0 LTR, and if there are no strong LTR characters we
can assume it is all level 1 RTL.

It might need some tweaking to check that the paragraph level matches the
text level.
Comment 1 Damon Chaplin 2005-09-06 14:44:24 UTC
Created attachment 51874 [details] [review]
Patch to speedup layout
Comment 2 Behdad Esfahbod 2005-09-06 15:20:04 UTC
Hi Damon,

Thanks for the patch, but it seriously breaks things.  Your assumption is wrong.
 To be safe, you need to check for FRIBIDI_MASK_EXPLICIT too.  Moreover, I
really don't like touching fribidi.c, since it's not upstream code and should
and will get updated from upstream every once in a while.

The other question is whether it really has 15% improvements.  I doubt that. 
How did you profile?  Which symbols show up?  I have profiled Pango several
times and the bidi code never showed up that much, and when it does, most of the
time is spent in _pango_fribidi_get_type.  The reason is pretty simple, fribidi
run-length-encodes the types and operates on the that.
Comment 3 Damon Chaplin 2005-09-06 16:18:19 UTC
Yes, I thought there might be other things to check for. But is it possible
to get it to work?

I'm using clock() to time a simple benchmark app. For short pieces of text
I was seeing improvements around 15%.

I think the main problem with the bidi code is that it does loads of mallocs,
about one for each word in the text (in new_type_link()).
Comment 4 Behdad Esfahbod 2005-09-06 17:42:19 UTC
About the malloc problem, original fribidi code is not like that, it keeps
unused type_links around.  In pango they replaced it with g_malloc, but I
*guess* glib already does smart things about allocating lots of small pieces of
memory.

Sorry if it sounds silly, but are you running ~10 times and taking the average?

The only way that the bidi code can be short-circuited is when the text doesn't
have any characters passing FRIBIDI_IS_RTL (please don't use masks directly),
then you can safely assign level 0 to all chars.  The levels are not necessarily
all 0 if you run the algorithm (you get level 2 for digits), but that shouldn't
make any difference (other than all in a sudden your digits can kern with
neighbouring text.)
Comment 5 Damon Chaplin 2005-09-07 13:32:30 UTC
g_malloc() normally just calls malloc(). It isn't optimised.

Using something like a static GArray would be an easy way to avoid all the
mallocs. (It is actually more like 2 mallocs per word at present - one for
the word and one for the space.)

(And yes I do run the benchmark several times!)
Comment 6 Behdad Esfahbod 2005-09-07 18:19:13 UTC
ok then maybe reverting back to the MemChunk code that has been in fribidi
itself, or using a Glib MemChunk.  You can see what it used to be in the
fribidi.patch file in there.
Comment 7 Matthias Clasen 2005-09-08 01:00:53 UTC
With the current GMemChunk that won't help
Comment 8 Behdad Esfahbod 2005-09-08 01:18:04 UTC
Matthias, can you elaborate please?  With the MemChunk code that I hacked up in
FriBidi, it allocates a page at a time, which is far better than 16 bytes at a
time.  What's wrong with GMemChunk then?
Comment 9 Behdad Esfahbod 2005-09-25 23:52:54 UTC
Damon, the patches in bug 317192 make mini-fribidi use GMemChunk for TypeLinks.
 Can you test performance-wise?
Comment 10 Owen Taylor 2005-09-26 12:33:29 UTC
GMemChunk is has consistently been measured to be a bad idea; free lists
can help in some cases, but backing those free lists off of GMemChunk rather
than malloc generally doesn't make sense. (see bug 118439 for lots of 
related discussion)


Comment 11 Matthias Clasen 2005-09-26 13:40:57 UTC
Behdad, GMemChunk is actually slower than malloc. See bug 118439 for my latest
efforts to come up with a chunked allocator that can allocate small chunks
without any space overhead and is faster than malloc.
Comment 12 Behdad Esfahbod 2005-10-01 08:53:21 UTC
Damon, other than moving FriBidi to use the new allocator that Matthias is
developing in bug 118439, do you think there's anything else in this bug to do?
Comment 13 Damon Chaplin 2005-10-02 20:25:24 UTC
It might still be worth skipping the bidi code if possible, though we can't
say without benchmarking it.

(In my test, GMemChunk is over twice as fast as malloc, if you use G_ALLOC_ONLY
(which the fribidi patch does). I think it is only G_ALLOC_AND_FREE that has
too much overhead.)
Comment 14 Behdad Esfahbod 2005-10-05 00:17:32 UTC
You are right about GMemChunk, FriBidi keeps a trash stack on top of GMemChunk,
so that's an improvement anyway.

Depending on the "update mini-fribidi" bug, since all changes should go after
that update.  You may want to just bitwise-or all FriBidiCharType's and if the
RTL bit is not set, simply return a zero vector as bidi-levels.
Comment 15 Matthias Clasen 2005-10-11 06:03:19 UTC
It is a bit ugly to see all the conversions that go on between fribidi and pango
here...

1) convert from utf8 to ucs4 because fribidi expects ucs4
2) fribidi mallocs an array of character types, only to convert it 
to a home-grown linked list and free it again
3) pango takes the linked list, and converts it back into an array 
Comment 16 Behdad Esfahbod 2005-10-12 03:02:54 UTC
Humm, good idea Matthias.  It can be easily fixed actually.  In the new FriBidi
codebase I've split the main analyze function into parts such that you can
shortcircuit several parts.  It's easier to fix later, so I'd say wait until
it's in upstream.
Comment 17 Behdad Esfahbod 2005-11-03 21:19:02 UTC
All Matthias's concerns in comment 15 are resolved now.  We just need to
short-circuit all-ltr case.  Doing now.
Comment 18 Behdad Esfahbod 2005-11-03 22:53:49 UTC
Created attachment 54297 [details] [review]
mini-fribidi shortcircuit patch

Attaching a patch for mini-fribidi to simply return a zero-filled
embedding_levels array if all chars are left-to-right.	Shows some improvement
for ltr locales.
Comment 19 Matthias Clasen 2005-11-04 18:05:04 UTC
how about the case of all rtl ? 

also, looking at the pango side, all we ever seem to use from the embedding levels 
is the information where the embedding level changes occur. Maybe we can save
something by only computing that information ?
Comment 20 Behdad Esfahbod 2005-11-04 20:11:43 UTC
all rtl very very rare: space is not rtl.  Not worth it.

We can export the internal FriBidi TypeLink thing which contains run-length
encoded information including the embedding levels.  Then Pango can ask fribidi
to free it after it's done with the data, which is used (only) during
itemization.  I may work on that, but not quite right now.  We can apply the
patch for now I guess.
Comment 21 Behdad Esfahbod 2005-11-09 00:27:03 UTC
Created attachment 54511 [details] [review]
new patch for both ltr and rtl

Ok, attaching a solid patch for both ltr and rtl.
Will apply after review (Matthias?)

Remains to get rid of the embedding_levels all together, and use a
run-length-encoded structure.
Comment 22 Behdad Esfahbod 2005-11-10 00:36:34 UTC
Patch committed.

I'm working on exposing mini-fribidi run-list to pango directly, so we don't
have to populate the embeddings_list array.
Comment 23 Behdad Esfahbod 2005-11-22 10:57:20 UTC
I decided that at this stage it's probably not worth it to blur the line between
pango and mini-fribidi by exposing the run list to pango.  Feel free to reopen
if you want to submit a patch.