After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 123646 - Lazily compute layout->log_attrs
Lazily compute layout->log_attrs
Status: RESOLVED OBSOLETE
Product: pango
Classification: Platform
Component: general
unspecified
Other All
: Normal enhancement
: Medium feature
Assigned To: pango-maint
pango-maint
Depends on:
Blocks: 123538
 
 
Reported: 2003-10-01 17:32 UTC by Morten Welinder
Modified: 2018-05-22 12:03 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement


Attachments
Test benchmark (1.29 KB, text/plain)
2003-10-01 19:17 UTC, Owen Taylor
  Details
Work in progress patch (6.44 KB, patch)
2005-11-10 05:01 UTC, Billy Biggs
needs-work Details | Review

Description Morten Welinder 2003-10-01 17:32:05 UTC
Displaying a list (treeview) with just 100 lines each consisting of 10000
x's and nothing else takes ~5s.  This is with today's cvs.

See bug 123538 [Gnumeric text import with long lines is slow].
See bug 114943 [no way to set GtkCellRendererText to single-paragraph mode].
Comment 1 Owen Taylor 2003-10-01 17:58:05 UTC
Can you get a profile? Or produce a test case that can be
profiled? I know nothing in initial text layout that 
should be more than linear in line size.
Comment 2 Morten Welinder 2003-10-01 18:31:15 UTC
Here's a reasonably clean profile.  Syscalls, except poll, are
included, but there aren't that many except X related.

(This is from gnumeric with a 102x10000 file; measurement is
from clicking "forward" during stf import after selecting
"fixed width". Gnumeric itself does little except build a
treeview with 102 lines in it and display it.)

Function List from qgn (pid 11208)
  All 9563 functions match '*'.
  Function time (% of .root.)
Function List from qgn (pid 11208)
  All 9563 functions match '*'.
  Function+descendants time (% of .root.)

 100.00%   .root.
 100.00%   g_signal_emit
 100.00%   gtk_main
 100.00%   g_type_class_meta_marshal
 100.00%   g_main_loop_run
 100.00%   gnm_file_opener_open
 100.00%   _gtk_marshal_BOOLEAN__BOXED
 100.00%   impl_emit_verb_on
 100.00%   _start
 100.00%   signal_emit_unlocked_R
 100.00%   gtk_button_button_release
 100.00%   gdk_event_dispatch
 100.00%   gtk_button_clicked
 100.00%   gtk_propagate_event
 100.00%   gnm_file_opener_open_real
 100.00%   wb_view_new_from_input
 100.00%   marshal_VOID__USER_DATA_STRING
 100.00%   bonobo_closure_invoke_va_list
 100.00%   bonobo_ui_engine_emit_verb_on_w
 100.00%   gtk_main_do_event
 100.00%   g_main_dispatch
 100.00%   exec_verb_cb[libbonoboui-2.so.0/910]
 100.00%   button_widget_clicked_cb
 100.00%   gui_file_open
 100.00%   g_main_context_dispatch
 100.00%   cb_file_open
 100.00%   gtk_widget_event_internal
 100.00%   g_signal_emit_valist
 100.00%   main
 100.00%   gtk_widget_event
 100.00%   g_cclosure_marshal_VOID__POINTER
 100.00%   Bonobo_UIComponent_execVerb
 100.00%   g_main_context_iterate
 100.00%   gui_file_read
 100.00%   ORBit_c_stub_invoke
 100.00%   g_cclosure_marshal_VOID__VOID
 100.00%   bonobo_ui_toolbar_item_activate
 100.00%   gtk_real_button_released
 100.00%   gtk_button_released
 100.00%   bonobo_closure_invoke
 100.00%   wb_view_new_from_file
 100.00%   stf_dialog
 100.00%   g_closure_invoke
 100.00%   stf_read_workbook
 100.00%   real_exec_verb
 100.00%   impl_Bonobo_UIComponent_execVerb
 100.00%   _ORBIT_skel_small_Bonobo_UIComponent_execVerb
  93.63%   g_idle_dispatch
  65.08%   pango_layout_check_lines
  60.02%   gtk_cell_renderer_text_get_size
  59.61%   pango_layout_get_extents_internal
  58.03%   pango_layout_get_extents
  58.03%   pango_layout_get_pixel_extents
  48.52%   gdk_window_process_all_updates
  48.50%   gdk_window_process_updates_internal
  48.48%   gtk_widget_send_expose
  48.20%   gtk_tree_view_expose
  48.18%   gtk_tree_view_bin_expose
  48.04%   _gtk_tree_view_column_cell_render
  48.04%   gtk_tree_view_column_cell_process_action
  48.03%   gtk_cell_renderer_render
  48.03%   gtk_cell_renderer_text_render
  47.08%   gtk_container_idle_sizer
  45.09%   presize_handler_callback
  45.09%   do_presize_handler
  45.09%   validate_visible_area
  45.08%   validate_row
  44.70%   gtk_tree_view_column_cell_get_size
  44.69%   gtk_cell_renderer_get_size
  39.83%   g_cclosure_marshal_VOID__BOXED
  39.78%   gtk_widget_size_request
  39.78%   _gtk_size_group_compute_requisition
  39.78%   do_size_request
  39.78%   g_signal_emit_by_name
  39.77%   gtk_tree_view_size_request
  39.77%   validate_rows
  39.77%   do_validate_rows
  32.20%   gtk_paint_layout
  32.20%   gtk_default_draw_layout
  32.19%   gdk_draw_layout
  32.19%   gdk_draw_layout_with_colors
  26.11%   pango_itemize
  20.19%   process_line
  19.22%   process_item
  19.22%   pango_shape
  18.38%   _pango_engine_shape_shape
  18.38%   basic_engine_shape
  17.49%   itemize_state_init
  16.83%   get_items_log_attrs
  16.83%   pango_break
  16.83%   pango_default_break
  16.41%   pango_layout_get_iter
  16.27%   pango_font_get_glyph_extents
  15.78%   gdk_draw_layout_line_with_colors
  13.92%   pango_xft_font_get_glyph_extents
  12.61%   gdk_draw_glyphs
  12.61%   gdk_window_draw_glyphs
  12.61%   gdk_pixmap_draw_glyphs
  12.61%   gdk_x11_draw_glyphs
  12.61%   pango_xft_render
  12.61%   pango_xft_real_render
  12.52%   pango_glyph_string_extents
  12.52%   pango_glyph_string_extents_range
  12.46%   pango_script_iter_new
  12.46%   pango_script_iter_next
  12.04%   XftDrawGlyphSpec
  12.04%   XftGlyphSpecCore
  11.48%   _XftSharpGlyphMono
  10.94%   pango_layout_run_get_extents
   9.36%   pango_layout_line_get_extents
   8.73%   XFillRectangle
   8.62%   itemize_state_process_run
   7.77%   get_line_extents_layout_coords
   7.13%   pango_fc_font_get_glyph
   6.95%   XftGlyphExtents
   6.87%   set_glyph[pango-basic-fc.so/1]
   6.49%   pango_script_for_unichar
   6.10%   next_clicked
   6.02%   prepare_page
   6.02%   stf_dialog_fixed_page_prepare
   6.02%   fixed_page_update_preview
   5.97%   stf_parse_general
   5.71%   _XFlushInt
   5.71%   _write
   4.64%   stf_parse_fixed_line
   3.67%   pango_log2vis_get_embedding_levels
   3.66%   get_pair_index
   3.59%   g_utf8_get_char
   3.58%   g_utf8_validate
   3.54%   fribidi_analyse_string
   3.40%   get_shaper_and_font
   3.30%   g_string_append_unichar
   2.95%   g_unichar_type
   2.80%   g_string_insert_unichar
   2.63%   g_type_check_instance_is_a
   2.57%   shaper_font_cache_get
   2.48%   get_layout
   2.47%   gtk_widget_create_pango_layout
   2.46%   pango_layout_set_text
   2.18%   g_unichar_isspace
   2.15%   g_hash_table_lookup
   2.12%   pango_xft_font_real_get_glyph
   1.93%   pango_find_paragraph_boundary
   1.93%   _pango_fribidi_get_type
   1.86%   xft_font_get_font
   1.82%   XftFontCheckGlyph
   1.81%   g_string_insert_len
   1.74%   itemize_state_add_character
   1.58%   update_run
   1.54%   g_unichar_isgraph
   1.48%   g_unichar_break_type
   1.46%   gdk_window_update_idle
   1.17%   _pango_xft_font_map_get_info
   1.16%   XftCharIndex

 10.66%   pango_default_break
  6.36%   pango_script_for_unichar
  5.71%   _write
  5.21%   XftGlyphExtents
  4.70%   pango_xft_font_get_glyph_extents
  3.59%   g_utf8_get_char
  3.58%   g_utf8_validate
  3.53%   get_pair_index
  3.10%   XFillRectangle
  2.75%   _XftSharpGlyphMono
  2.70%   g_unichar_type
  2.49%   g_type_check_instance_is_a
  2.48%   pango_glyph_string_extents_range
  2.06%   g_unichar_isspace
  2.01%   pango_font_get_glyph_extents
  1.86%   basic_engine_shape
  1.82%   XftFontCheckGlyph
  1.73%   itemize_state_add_character
  1.67%   pango_script_iter_next
  1.67%   _pango_fribidi_get_type
  1.61%   pango_fc_font_get_glyph
  1.49%   g_hash_table_lookup
  1.41%   g_unichar_isgraph
  1.39%   xft_font_get_font
  1.35%   g_unichar_break_type
  1.28%   pango_find_paragraph_boundary
  1.28%   itemize_state_process_run
  1.17%   _pango_xft_font_map_get_info
  1.16%   fribidi_analyse_string
  1.07%   g_string_insert_len
  0.96%   g_utf8_strlen
  0.96%   process_line
  0.84%   pango_shape
  0.77%   .urem
  0.77%   g_utf8_to_ucs4_fast
  0.77%   XftCharIndex
  0.72%   stf_parse_fixed_line
  0.71%   get_shaper_and_font
  0.64%   set_glyph[pango-basic-fc.so/1]
  0.58%   pango_fc_font_get_type
  0.58%   update_embedding_end
  0.57%   pango_xft_real_render
  0.53%   g_string_insert_unichar
  0.51%   Letext[libglib-2.0.so.0/844]
  0.46%   unknown_static_function[libpangoxft-1.0.so.0/0]
  0.45%   run_length_encode_types
  0.45%   pango_xft_font_real_get_glyph
  0.45%   shaper_font_cache_get
  0.42%   memmove
  0.42%   g_string_append_unichar
  0.38%   g_unichar_to_utf8
  0.34%   Letext[libpango-1.0.so.0/5]
  0.26%   unknown_static_function[libpangoft2-1.0.so.0/0]
  0.26%   Letext[libpango-1.0.so.0/402]
  0.26%   Letext[libpango-1.0.so.0/475]
  0.24%   XftGlyphSpecCore
  0.23%   strlen
  0.23%   g_string_maybe_expand
  0.23%   compare_terminator
  0.23%   unknown_static_function[libglib-2.0.so.0/716]
  0.21%   strchr
  0.14%   Letext[libgobject-2.0.so.0/359]
  0.13%   unknown_static_function[libglib-2.0.so.0/213]
  0.13%   g_direct_hash
  0.13%   pango_log2vis_get_embedding_levels
  0.13%   Letext[libpango-1.0.so.0/166]
  0.13%   Letext[libglib-2.0.so.0/901]
  0.12%   stat
  0.07%   _read
  0.06%   memcpy
  0.06%   g_type_check_instance_cast
  0.03%   _free_unlocked
  0.03%   mutex_unlock
  0.02%   ioctl
  0.02%   realfree
  0.02%   _malloc_unlocked
  0.02%   cleanfree
  0.02%   mutex_lock
  0.01%   g_type_value_table_peek
  0.01%   g_signal_emit_valist
  0.01%   g_type_is_a
  0.01%   signal_emit_unlocked_R
  0.01%   _brk_unlocked
  0.01%   t_splay
  0.01%   g_free
  0.01%   _smalloc
  0.01%   g_malloc
  0.01%   g_type_check_value
  0.01%   malloc
  0.01%   g_type_check_is_value_type
  0.01%   free
  0.01%   strcmp
  0.01%   g_datalist_id_set_data_full
  0.01%   param_spec_pool_hash

Cummulatively...

Function List from qgn (pid 11208)
  All 9563 functions match '*'.
  Function+descendants time (% of .root.)

 100.00%   .root.
 100.00%   g_signal_emit
 100.00%   gtk_main
 100.00%   g_type_class_meta_marshal
 100.00%   g_main_loop_run
 100.00%   gnm_file_opener_open
 100.00%   _gtk_marshal_BOOLEAN__BOXED
 100.00%   impl_emit_verb_on
 100.00%   _start
 100.00%   signal_emit_unlocked_R
 100.00%   gtk_button_button_release
 100.00%   gdk_event_dispatch
 100.00%   gtk_button_clicked
 100.00%   gtk_propagate_event
 100.00%   gnm_file_opener_open_real
 100.00%   wb_view_new_from_input
 100.00%   marshal_VOID__USER_DATA_STRING
 100.00%   bonobo_closure_invoke_va_list
 100.00%   bonobo_ui_engine_emit_verb_on_w
 100.00%   gtk_main_do_event
 100.00%   g_main_dispatch
 100.00%   exec_verb_cb[libbonoboui-2.so.0/910]
 100.00%   button_widget_clicked_cb
 100.00%   gui_file_open
 100.00%   g_main_context_dispatch
 100.00%   cb_file_open
 100.00%   gtk_widget_event_internal
 100.00%   g_signal_emit_valist
 100.00%   main
 100.00%   gtk_widget_event
 100.00%   g_cclosure_marshal_VOID__POINTER
 100.00%   Bonobo_UIComponent_execVerb
 100.00%   g_main_context_iterate
 100.00%   gui_file_read
 100.00%   ORBit_c_stub_invoke
 100.00%   g_cclosure_marshal_VOID__VOID
 100.00%   bonobo_ui_toolbar_item_activate
 100.00%   gtk_real_button_released
 100.00%   gtk_button_released
 100.00%   bonobo_closure_invoke
 100.00%   wb_view_new_from_file
 100.00%   stf_dialog
 100.00%   g_closure_invoke
 100.00%   stf_read_workbook
 100.00%   real_exec_verb
 100.00%   impl_Bonobo_UIComponent_execVerb
 100.00%   _ORBIT_skel_small_Bonobo_UIComponent_execVerb
  93.63%   g_idle_dispatch
  65.08%   pango_layout_check_lines
  60.02%   gtk_cell_renderer_text_get_size
  59.61%   pango_layout_get_extents_internal
  58.03%   pango_layout_get_extents
  58.03%   pango_layout_get_pixel_extents
  48.52%   gdk_window_process_all_updates
  48.50%   gdk_window_process_updates_internal
  48.48%   gtk_widget_send_expose
  48.20%   gtk_tree_view_expose
  48.18%   gtk_tree_view_bin_expose
  48.04%   _gtk_tree_view_column_cell_render
  48.04%   gtk_tree_view_column_cell_process_action
  48.03%   gtk_cell_renderer_render
  48.03%   gtk_cell_renderer_text_render
  47.08%   gtk_container_idle_sizer
  45.09%   presize_handler_callback
  45.09%   do_presize_handler
  45.09%   validate_visible_area
  45.08%   validate_row
  44.70%   gtk_tree_view_column_cell_get_size
  44.69%   gtk_cell_renderer_get_size
  39.83%   g_cclosure_marshal_VOID__BOXED
  39.78%   gtk_widget_size_request
  39.78%   _gtk_size_group_compute_requisition
  39.78%   do_size_request
  39.78%   g_signal_emit_by_name
  39.77%   gtk_tree_view_size_request
  39.77%   validate_rows
  39.77%   do_validate_rows
  32.20%   gtk_paint_layout
  32.20%   gtk_default_draw_layout
  32.19%   gdk_draw_layout
  32.19%   gdk_draw_layout_with_colors
  26.11%   pango_itemize
  20.19%   process_line
  19.22%   process_item
  19.22%   pango_shape
  18.38%   _pango_engine_shape_shape
  18.38%   basic_engine_shape
  17.49%   itemize_state_init
  16.83%   get_items_log_attrs
  16.83%   pango_break
  16.83%   pango_default_break
  16.41%   pango_layout_get_iter
  16.27%   pango_font_get_glyph_extents
  15.78%   gdk_draw_layout_line_with_colors
  13.92%   pango_xft_font_get_glyph_extents
  12.61%   gdk_draw_glyphs
  12.61%   gdk_window_draw_glyphs
  12.61%   gdk_pixmap_draw_glyphs
  12.61%   gdk_x11_draw_glyphs
  12.61%   pango_xft_render
  12.61%   pango_xft_real_render
  12.52%   pango_glyph_string_extents
  12.52%   pango_glyph_string_extents_range
  12.46%   pango_script_iter_new
  12.46%   pango_script_iter_next
  12.04%   XftDrawGlyphSpec
  12.04%   XftGlyphSpecCore
  11.48%   _XftSharpGlyphMono
  10.94%   pango_layout_run_get_extents
   9.36%   pango_layout_line_get_extents
   8.73%   XFillRectangle
   8.62%   itemize_state_process_run
   7.77%   get_line_extents_layout_coords
   7.13%   pango_fc_font_get_glyph
   6.95%   XftGlyphExtents
   6.87%   set_glyph[pango-basic-fc.so/1]
   6.49%   pango_script_for_unichar
   6.10%   next_clicked
   6.02%   prepare_page
   6.02%   stf_dialog_fixed_page_prepare
   6.02%   fixed_page_update_preview
   5.97%   stf_parse_general
   5.71%   _XFlushInt
   5.71%   _write
   4.64%   stf_parse_fixed_line
   3.67%   pango_log2vis_get_embedding_levels
   3.66%   get_pair_index
   3.59%   g_utf8_get_char
   3.58%   g_utf8_validate
   3.54%   fribidi_analyse_string
   3.40%   get_shaper_and_font
   3.30%   g_string_append_unichar
   2.95%   g_unichar_type
   2.80%   g_string_insert_unichar
   2.63%   g_type_check_instance_is_a
   2.57%   shaper_font_cache_get
   2.48%   get_layout
   2.47%   gtk_widget_create_pango_layout
   2.46%   pango_layout_set_text
   2.18%   g_unichar_isspace
   2.15%   g_hash_table_lookup
   2.12%   pango_xft_font_real_get_glyph
   1.93%   pango_find_paragraph_boundary
   1.93%   _pango_fribidi_get_type
   1.86%   xft_font_get_font
   1.82%   XftFontCheckGlyph
   1.81%   g_string_insert_len
   1.74%   itemize_state_add_character
   1.58%   update_run
   1.54%   g_unichar_isgraph
   1.48%   g_unichar_break_type
   1.46%   gdk_window_update_idle
   1.17%   _pango_xft_font_map_get_info
   1.16%   XftCharIndex

Comment 3 Owen Taylor 2003-10-01 19:17:37 UTC
Created attachment 20413 [details]
Test benchmark
Comment 4 Owen Taylor 2003-10-01 19:20:30 UTC
With the attached benchmark, I get:

Laying out 1000000 chars
String length = 10 ... took 3.751364 seconds
String length = 100 ... took 1.745634 seconds
String length = 1000 ... took 1.538398 seconds
String length = 5000 ... took 1.531216 seconds

Which seems to indicate that layout gets faster as strings
get longer. Maybe you can modify the test case to show the
problem you are having? Nothing really stands out in your
profile above ... the percentages are about what I'd expect.
Comment 5 Morten Welinder 2003-10-01 20:39:18 UTC
Your numbers do not disagree with mine -- I can easily see your machine
being three times faster than mine on a per-cpu basis.

Still, 1.5s for displaying what really amounts to one screenful of
text?  I would much prefer that to be less than 0.1s.

Why is it trying to find break-points to begin with, btw?  The whole
10000x thing is going to be set one line.
Comment 6 Owen Taylor 2003-10-14 23:14:44 UTC
I'd say a screenful of data is 80x25 characters canonically - 
2000 characters. So, we're talking 500 screenfuls of data.

If you want to consider this 1 screen of data, then you
need something different ... a grid based data-layout widget,
that can know that it's only displaying the first horizontal
page of data and can ignore the rest.

pango_break() is not just used to determine line breaks,
it also is used for cursor navigation, for selection, eventually
it will be used for justification. At the point that initial
layout is done, Pango has no way of knowing that it won't
later need that information,.

It's probably possible to compute layout->log_attrs lazily, and 
I'd accept a clean patch to do that.

But in the end, that's ~10 percent of the run time, so it's
not going to make things 15 times faster as you seem to want.
Comment 7 Owen Taylor 2004-12-13 20:49:45 UTC
It would probably be a noticeable speed improvement for lots of unwrapped short
strings (good) but it  would mean multiple scans over the text and attributes for 
the cases where we do need the attributes (bad).

If you were going to do this, I'd rip out get_items_log_attrs() entirely
and replace by computing the log attrs for the entire layout at one
pass.
Comment 8 Billy Biggs 2005-11-10 05:01:20 UTC
Created attachment 54562 [details] [review]
Work in progress patch

I'm experimenting with lazily computing the logical attributes.  Besides the
public API to return the array, it's mostly just the cursor positions which are
required internally by pango.  The other flags used are is_char_break and
is_line_break, which are only required during wrapping.  Besides the public
APIs which request it, the cursor positions are also used for supporting letter
spacing.

My proposal is to lazily calculate all of the logical attributes whenever we
require the cursor positions or the logical attributes are requested, and
always calculate them if we are wrapping or using letter spacing.

There are two issues with this patch:
  1. It currently re-itemizes in order to calculate the logical attributes.
     How can this be avoided?  Should the itemization be cached?  Is it really
     needed?
  2. I don't know how to detect whether we require letter spacing early enough.
Comment 9 Owen Taylor 2005-11-14 04:03:05 UTC
I'd prefer if possible to see an approach where we *always* generate
the log attrs lazily ... this will make things cleaner and also more
efficient, rather than sometimes doing things lazily and sometimes
not. Questions like "how to detect whether we require letter spacing 
early enough" just drop away ... early enough is whenever we need
the attributes.

I think we can separate out two cases:

 A) We need the log attrs *while* processing a paragraph. At this point,
    we still have the PangoItems that we would need to compute the log
    attrs ... true, they are separated between the already generated
    lines and ParaBreakState, but they are there.

 B) We need the log attrs at some later point.

From these two cases we can consider two strategies:

Strategy 1) We handle the two cases separately - in case A) we 
  we work off the current itemization, in case B) we reitemize.

Strategy 2) We handle the two cases together. This would probably
  mean always computing the information from get_items_log_attrs()
  and storing it away - this is basically a typically extremely
  short list of:

   Range: PangoLangEngine, PangoLanguage
 
  In virtually all cases, this list is of length 1. I guess
  since pango_check_lines() works paragraph by paragraph, you'd 
  need one element per paragraph, but even so, it's a compact set of
  information; a small array stored for the PangoLayout.

  (In fact, they "dirty secret" is that pango_break() currently
  ignores this information entirely , since we have no non-default 
  language engines, and no tailoring of the default engine to 
  particular languages. I don't think we should hardcode that into
  PangoLayout, however.)

In general, I like the second approach better and I don't think
it should be hard to implement. 
Comment 10 Behdad Esfahbod 2006-01-22 17:36:58 UTC
Some random comment found in 5-year old TODO.xml that is not quite irrelevant here:

        When width == -1 and there is only a single line, then a lot
        of the <code>PangoLayout</code> code makes an
        unecessary call to <code>pango-layout_get_extents</code>.
Comment 11 Behdad Esfahbod 2006-02-14 21:08:05 UTC
Currently pango_break does sentence breaks, word breaks, line breaks, cursor position (grapheme breaks), and char breaks.  Char break and line break are computed together, but otherwise, pango_default_break is a loop over the string, doing four almost decoupled operations, computing these different break attrs.

So my question is, shall we separate these and only compute (lazily) whichever we need?  I can imagine there would be more looping overhead, but that may as well be offset by the shorter loops that are more cache-friendly.

Separating these different attrs also makes it possible to do tailoring in modules that only override the attrs that they need.  For example, if a module wants to override the line breaking attr, it will only implement that and the rest will go through the default implementation.  Currently the module has no choice other than 1) duplicating the default_break code and change the line break parts of it, or 2) call default_break, and override the line break attrs, essentially, paying the default line_break penalty while not using it.

So, I propose a new set of functions for accessing log attrs and break, that take a mast of some kind (and can also return a mask of what it did support and filled in, such that you can actually query what special features does a module provide?).  That should make pango_break more useful for AbiWord too, which is hesitant to call it right now, because they only need cursor position for example...
Comment 12 Mart Raudsepp 2007-04-24 02:07:42 UTC
I've been working on separating pango_default_break out into four different function and got the sentence break separate before lost free time. I hope to continue work on this in May - if someone else is interested (I hope so), I can post a WIP patch.
One good reason for me to do that, beyond the ability to lazy compute, was to get an easier view on the assembler that is being generated, and the low-level dependence chains thereof and maybe even pipeline/ALU utilization and such if I felt really really crazy ;)

Meanwhile, some thoughts on the performance impacts of this that I gathered during that work and low-level optimization research done afterwards, in case that's useful for someone - if even to just laugh at me being misguided in anything :)

* We need less temporary variables in one swoop, so possibly better cache locality.

* Obviously we can leave some work out if we don't need sentence breaks for example, but there are dependence chains as well. For example line breaks reuse the analyzing done for cursor position (grapheme breaks)

* Related to that, there are some dependence chains in the big logic. It might be stalling things, but probably not so much if at all.
For example there is immediate usage of is_cursor_position to get backspace_deletes_character, but as the latter isn't needed anywhere soon, hopefully the compiler is able to move it down or CPU is able to just pre-emptively continue with the line breaking algorithm while cursor position write is still in the pipeline and only stall backspace_deletes_character as it doesn't contain any function calls to complicate predictions. Might want to move it down in the source code a bit in a strategical position (for assembler level micro-optimization), regardless.

* Breaking it up doesn't seem to introduce any new loop-carried dependencies, as we don't address attrs[i-1], but only attrs[i] plus usage of prev_wc

* Almost all reads and writes going on with attrs[i] struct members are unaligned (bitfield). If the logic is all in one block, it _might_ be easier to optimize in countermeasures for unaligned access. For example where we need previous cursor position calculation result (if a grapheme break can happen at attrs[i] or not), we might be able to get rid of the unaligned read. I'd need to read some of the generated assembler closer to see if unaligned access is a possible problem at all.

* Partial memory writes (due to bitfields again) could stall read of other properties in the struct, or vice-versa. I need to study this type of problem a lot more before commenting anything further here.


Measuring any of this needs quite some benchmarking, for which the pango-profile module is useful. I think ideally it would call only pango_break on the strings.


PS: Where would we store what type of breaks have been calculated on a given layout? PangoAnalysis doesn't feel right (but I'm ignorant of that structures purpose) and neither does storing it in PangoLogAttr where it would have to be stored for each character..
Comment 13 Behdad Esfahbod 2007-04-24 02:17:21 UTC
I'm rewriting break.c to comply with latest Unicode standard and plan to exactly do what you suggest.  Can you just develop a simple benchmark while you are at it?

As for where to store what we have computed, a PangoLayout member is fine.  The problem however is that since it's a bitfield, we need to do lots of bit twiddling as you say to add a new bit, as opposed to when we have a single loop, we can compute the entire attr[i] and assign it.

The only case we have to compute all properties is when pango_get_log_attrs is called.  But we can add a new version of that that take a PangoLogAttr which marks which fields are wanted.  (Or a new bitfield type).
Comment 14 Mart Raudsepp 2007-04-24 14:05:05 UTC
(In reply to comment #13)
> Can you just develop a simple benchmark while you
> are at it?

I can take a look once I'm back at my own machine next week or beyond :)

> As for where to store what we have computed, a PangoLayout member is fine.

The problem that I had is that no PangoLayout is passed to pango_break or pango_default_break (which are in the public API). Only PangoAnalysis and PangoLogAttr, of which the latter is an array whose every member represents one character or byte and not suitable for storing the fact that calcs are done unless we do it for the first index only or some such hackish stuff or write an additional bit per every single character, which quite likely kills some performance.

> The problem however is that since it's a bitfield, we need to do lots of bit
> twiddling as you say to add a new bit, as opposed to when we have a single
> loop, we can compute the entire attr[i] and assign it.
> 
> The only case we have to compute all properties is when pango_get_log_attrs is
> called.  But we can add a new version of that that take a PangoLogAttr which
> marks which fields are wanted.  (Or a new bitfield type).

An additional PangoLogAttr sounds good - we can do a bitwise AND to see which logical parts are necessary. For example with just is_white, it could just go the g_unichar_is_space loop. Then again, it might become more complex than necessary for passing in a suitable value there (possibly need to create the PangoLogAttr from bits in user code, etc)
Comment 15 Javier Jardón (IRC: jjardon) 2009-05-29 00:27:16 UTC
Dear hackers, 

Any news on this bug ?

regards
Comment 16 Behdad Esfahbod 2009-05-29 12:16:50 UTC
(In reply to comment #15)
> Dear hackers, 
> 
> Any news on this bug ?

Not really.  Why?
It will be reported here if any progress is made.

Cheers,
behdad


> regards
> 

Comment 17 GNOME Infrastructure Team 2018-05-22 12:03:24 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/pango/issues/10.