Bug 702389 – Layouting performance on very long lines containing many tabs is suboptimal

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 702389 - Layouting performance on very long lines containing many tabs is suboptimal


Summary:	Layouting performance on very long lines containing many tabs is suboptimal


Status:	RESOLVED OBSOLETE

Product:	pango
Classification:	Platform
Component:	general
Version:	1.34.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-06-16 13:12 UTC by Kristian Rietveld
Modified:	2018-05-22 13:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH 1/3] Bypass line reordering if all runs have analysis level zero (1.49 KB, patch) 2013-06-16 13:17 UTC, Kristian Rietveld	none	Details \| Review
[PATCH 2/3] Cache line width during construction of the line (3.07 KB, patch) 2013-06-16 13:21 UTC, Kristian Rietveld	none	Details \| Review
3/3 Redo shape_tab and get_tab_pos to have better performance for long lines (5.49 KB, patch) 2013-06-16 13:25 UTC, Kristian Rietveld	none	Details \| Review
Update 1/3 : ypass line reordering if all runs have the same direction (1.41 KB, patch) 2013-07-02 07:14 UTC, Kristian Rietveld	accepted-commit_now	Details \| Review
Update 3/3 : Redo shape_tab and get_tab_pos to have better performance for long lines (5.49 KB, patch) 2013-07-02 07:15 UTC, Kristian Rietveld	none	Details \| Review
Update 1/3 : Bypass line reordering if all runs have the same direction (1.64 KB, patch) 2013-07-14 19:36 UTC, Kristian Rietveld	committed	Details \| Review

Description Kristian Rietveld 2013-06-16 13:12:50 UTC

Consider a code editor which displays lines with wrapping disabled. Pango's layouting performance on very long lines that contain lots of tabs is suboptimal. This can be explained by the fact that tabs force the line to be broken into many runs. Because line wrapping is disabled, these runs are all put into a single PangoLayoutLine. Due to the large amount of runs some of the algorithms within Pango's layouting code exhibit suboptimal performance.

This has been tested with a string solely consisting out of tabs and normal spaces. Groups of spaces and tabs are stored in separate runs.

I have identified three areas within the layouting code where performance can be improved without too much work. I will attach three patches corresponding to these areas. The patches seem to behave well, but please NOTE that I have not done extensive testing with scripts other than English ...

Comment 1 Kristian Rietveld 2013-06-16 13:17:47 UTC

Created attachment 246945 [details] [review]
[PATCH 1/3] Bypass line reordering if all runs have analysis level zero    

For all lines, an expensive reordering algorithm is run in pango_layout_line_reorder() that reconstructs a singly linked list. In case we know that the line only contains runs with equal analysis level, this line reconstruction is not necessary. Since the length of the line needs to be determined, we can determine whether all items have the same analysis level at the same time and decide to take a shortcut based on this.

FIXME: There is the potential to further generalize this patch. I would like the know whether a patch like this would be acceptable for inclusion and whether I am not missing something (e.g. possibly there is a reason why reconstruction is always necessary?).

Comment 2 Kristian Rietveld 2013-06-16 13:21:11 UTC

Created attachment 246947 [details] [review]
[PATCH 2/3] Cache line width during construction of the line

Every call to shape_tab() to shape a tab, will call line_width(). For a line with many tabs, this inefficient function is called frequently, each time recomputing the width of the line up till now.

Even though the comment in the function mentions that keeping the current width of the line up to date everywhere is hard, this patch introduces a "cached_line_width". The cached_line_width basically corresponds to the current return value of line_width() if it would be called, avoiding performing an expensive iteration of the full line repeatedly.

Comment 3 Kristian Rietveld 2013-06-16 13:25:11 UTC

Created attachment 246949 [details] [review]
3/3 Redo shape_tab and get_tab_pos to have better performance for long lines

Next to calling line_width(), shape_tab() repeatedly calls get_tab_pos() from a for loop. For strings with a large width, get_tab_pos() will be called frequently.

The main problem is that searching for tab position begins from the start of the line, regardless of the current width of the line. This patch carefully re-engineers get_tab_pos() and shape_tab() to start looking for a suitable tab position based on the current_width of the line, so close to the target tab position. As a result, many function calls to get_tab_pos() and indirectly to pango_tab_array_*() are avoided.

Comment 4 Behdad Esfahbod 2013-06-23 03:46:30 UTC

I agree with your analysis, but prefer to find a solution that wouldn't complicate the codebase unnecessarily.  Let me find some time to take a deeper look into this.  PangoLayout is due for a rewrite soon anyway, but I don't think I'll get to do that this year or even the next :(.

If you clean up your patches to your own satisfaction, feel free to push them out.  Re the bidi stuff, you're right, all-even and all-odd should bypass reordering.

Comment 5 Kristian Rietveld 2013-06-23 19:20:43 UTC

Thanks for having a loop. I will update patch 1) with the all-even/all-odd modification soon.

Also, I just found a typo crept in patch 3 during rebasing (last_post instead of last_pos); sorry for that.

Comment 6 Kristian Rietveld 2013-07-02 07:14:54 UTC

Created attachment 248203 [details] [review]
Update 1/3 : ypass line reordering if all runs have the same direction           

Update patch to not check for all zero, but for all even or all odd.

Comment 7 Kristian Rietveld 2013-07-02 07:15:50 UTC

Created attachment 248204 [details] [review]
Update 3/3 : Redo shape_tab and get_tab_pos to have better performance for long lines

Update patch to fix a typo.

Comment 8 Kristian Rietveld 2013-07-02 07:19:17 UTC

Behdad,

(In reply to comment #4)
> I agree with your analysis, but prefer to find a solution that wouldn't
> complicate the codebase unnecessarily.  Let me find some time to take a deeper
> look into this

On a second thought, I would say that only patch 2 complicates the codebase, because it changes from computing line_width at a single location to multiple locations. But I don't see how this can be done any cleaner without hampering performance.

Patches 1 and 3 don't really complicate the code base in my opinion.


Will you find time to briefly take a deeper look any time soon? Otherwise I push the patches for more testing to master in the next week or so? (I don't know what the current release schedule is, but some more testing of the code first would be good)

Comment 9 Behdad Esfahbod 2013-07-03 17:12:58 UTC

(In reply to comment #6)
> Created an attachment (id=248203) [details] [review]
> Update 1/3 : ypass line reordering if all runs have the same direction          
> 
> Update patch to not check for all zero, but for all even or all odd.

LGTM.  Feel free to push out.  If I was to write this I would have just calculated the "bitwise and" and "bitwise or" of all the levels to avoid conditionals.

Comment 10 Kristian Rietveld 2013-07-14 19:36:54 UTC

Created attachment 249138 [details] [review]
Update 1/3 : Bypass line reordering if all runs have the same direction           

Ah, that's clever, I like bitwise tricks. Updated the patch. This is what I intend to push in the coming week.

Comment 11 Behdad Esfahbod 2013-07-15 17:03:12 UTC

LGTM.

Comment 12 Kristian Rietveld 2013-07-17 08:29:04 UTC

Thanks! Committed patch 1/3.

Comment 13 Behdad Esfahbod 2017-08-31 22:40:42 UTC

Donno what the status here is but let's close for now...

Comment 14 Kristian Rietveld 2017-09-01 09:15:51 UTC

Only Patch 1 of 3 was pushed. I didn't push patches 2 and 3 as you felt they would unnecessarily complicate the code base. These patches were needed in the product where we were seeing these performance issues however.

Comment 15 Behdad Esfahbod 2017-09-01 18:17:37 UTC

Ok, I'll try to see if I can review them.  I have a vague memory that I thought they shouldn't be necessary...

Comment 16 Behdad Esfahbod 2017-11-14 05:53:20 UTC

Leaving for Matthias...

Comment 17 GNOME Infrastructure Team 2018-05-22 13:09:51 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/pango/issues/220.