GNOME Bugzilla – Bug 70451
Automatic paragraph direction according to Unicode BiDi algorithm
Last modified: 2005-06-03 22:10:57 UTC
The direction of a paragraph (RTL or LTR) should be calculated automatically unless it has been overriden. This would probably need introducing one more direction type, PANGO_DIRECTION_AUTO, that should be the default direction of paragraphs in the GtkTextView widget. When the direction is PANGO_DIRECTION_AUTO, the direction should become the direction returned by fribidi. To implement this properly, there is also a need to deal with paragraphs that are composed only of neutral characters (E.g. a single of line of ">" in a quated email message). Such neutral paragraphs should inherit their direction by the previous paragraph. If the cursor is in a neutral paragraph, the paragraph direction may be according to the keyboard. An RTL layout would cause the paragraph to be RTL. The only remaining decision is what to do with a first neutral paragraph. This may be solved by introducing a default global direction that will be used unless something else was specified. But in practice, it doesn't really matter.
Dov: I'm not an expert, but I'm trying to learn :) Shouldn't this bug be filed against pango?
It is somewhere between the border between GtkTextView and pango. I believe that pango exports enough functionality to solve this problem within the context of GtkTextViewer, but there might very well be some small changes to pango as well. In short, both Havoc and Owen need to be involved.
Is this an essential feature ("text widget unusable without it") or a "would be nice"? Trying to pick a milestone.
Though I think it is part of "normal" behaviour for a non-sophisticated user, I don't think it is importent enough to delay Gtk v2.0. Therefore I switched it to target milestone 2.2. I might have a look at it myself, if I can get some guidance of where such functionality should be inserted in the code.
Making all 'high' textview bugs 2.2.0 as per havoc's request.
It sounds to me like the first step here is a pango API bug for a way to get the autodetected direction of a PangoLayout. Just pango_layout_get_direction() or something presumably. (Unless we already have that)
Created attachment 14977 [details] [review] Proposed pango and fribidi additions for gtktextviewer auto bidi dir
Created attachment 14978 [details] [review] Proposed changes for gtk text viewer automatic paragraph dir according to BiDi algorithm.
The two patches that I just commited carries out automatic paragraph direction according to the BiDi direction. The patches still lack some functionality in order to be release level: 1. The cursor dissappears in the right margin for RTL paragraphs.I don't know where the code is for this one. 2. The direction for texts where all paragraphs only contains neutral characters is not honoring the style. This should be easy to fix.
Created attachment 14979 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 03/12/03 16:10.
I just added a new patch for gtk+ that replaces the previous patch that I sent in. This new patch solves the two remaining problems that I mentioned earlier, namely: 1. It now chooses the direction according to the style, if the buffer doesn't contain any strong bidi characters. 2. The problem with the disappearing cursor has been fixed. As far as I am concerned this patch is already good enough for integration. Please review, and give ok for commiting.
Created attachment 15048 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 03/12/03 17:01
I just added a third patch that fixes one efficiency problem with the previous patch. Namely that the cursor line direction is determining the line direction if there is no strong character in the line. In order to do this I need to compare the current line being rendered with the cursor line. But in order to do that I need to access the cursor line. In my previous patch I got the cursor line by querying for the position of the "insert" mark at rendering time. I now changed it so that gtk_text_view manually updates the cursor line whenever it is changed. Unfortunately this meant scattering gtk_text_view with lots of calls to gtk_text_layout_update_cursor_line(text_view->layout). One additional change that is actually a small api change in this third buffer is that I changed the routine gtk_text_layout_set_cursor_direction (text_view->layout, new_dir); with the new call: gtk_text_layout_set_cursor_and_keyboard_direction(). I seriously doubt that anyone but gtktextview is calling this function, but if that is a problem, then the old function may always be left behind and be made obsolete.
Comment on your last comment - haven't looked at the patch yet. Note that anybody can move the insertion cursor without going through gtktextview.c at all, so scattering update calls isn't going to work. See gtk_text_view_mark_set_handler for how the text view tracks the location of the insertion mark. GtkTextLayout could do the same.
Adding Matthias and Havoc to the Cc: since they won't see the GtkTextLayout part of this bug (which is the big complicated part, really) otherwise.
Created attachment 15063 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 03/15/03 16:29.
I just submitted yet another patch for gtk that implements the feedback from Owen's last comment. The cursor position is now updated by catching the three signals "mark_set", "insert_text", and "delete_range" from the gtk_text_buffer. (Further optimizations are probably possible, but it is not clear how much effort is worthing doing in order not to call: gtk_text_buffer_get_iter_at_mark (layout->buffer, &iter, gtk_text_buffer_get_mark (layout->buffer, "insert")); ) I have also cleaned away some old dead code that I by mistake left behind in the previous patches.
Adding myself and Behdad Esfahbod to the CC list.
One general comment - in the future you might want to use 'diff -up' instead of 'diff -u' (you can put that in your ~/.cvsrc). It makes reading patches a fair bit easier since you get information about which function each change is in. Comments on Pango patch: - Ugh, this is a (small) mess. PANGO_DIRECTION_TTB_LTR and PANGO_DIRECTION_WEAK_LTR really aren't the same thing at all. But it's going to be hard to split them apart at this point. (Bidi text in vertical writing is actually a possible, if rare thing ... think of a Chinese book with some embedded Arabic or Hebrew words. If you had an embedded Arabic quote, you can actually need to do weak/strong directional resolution for the vertical-text.) I think the right thing to do is probably to use PangoDirection for Bidi direction, deprecate PANGO_DIRECTION_TTB_RTL and introduce new API for vertical writing when we actually have this. (To make your head spin, CJK is typically PANGO_DIRECTION_TTB_RTL, in which case, you want to handle embedded non-vertical text with 90 degree clockwise rotation, so LTR text goes top-to-bottom and RTL text bottom to top. On the other hand, Mongolian is PANGO_DIRECTION_TTB_LTR; conceptually a 90 degree _counter-clockwise_ rotation, so that RTL text text goes from top to bottom. If you are mixing Mongolian and Arabic, Cyrillic, or English, things are pretty clear - you want the Mongolian characters to be strong-RTL. However, if you are mixing Mongolian and Chinese, then both scripts should go top-to-bottom, so, the Chinese characters have to switch their Bidi nature to be RTL as well!) - There is a pangoft2.c change that seems to have nothing to do with the rest of the patch and needs to be filed separately. - I don't think the pango_find_base_dir / find_base_dir_utf8 distinction is good; text passed into Pango is in UTF-8 always. - pango_find_base_dir() (taking UTF-8 input) should accept the <0 => nul-terminated convention used throughout GLib and Pango. - There is a typo in pango_find_base_dir_utf8() docs. - I think pango_direction_strong_to_weak() is just too specialized to be worth having in the Pango API. I think code is going to be easier to understand if it just hardcodes the operation. - The functions should be in pango-utils.h not pango-break.h, I think. - Prototypes in the header files need to be formatted like the other prototypes. - pango-enums.c is autogenerated, so shouldn't be in the patch. - pango_get_char_dir() I'd call pango_unichar_direction() to match all the g_unichar_* functions. (Yes, there is pango_get_mirror_char(), but I think the analogy there is weaker.)
I've spent some time looking at the GtkTextView patch now, and also have some comments there. On a large scale, my biggest concern when looking at the patch is the need to search through surrounding text in some cases. While it probably isn't going to matter much in practice, it worries me that the work to insert N blank lines is going to be O(N^2). This is something we've tried hard to avoid in GtkTextView. It strikes me that by storing, instead of dir_propagated, two fields, dir_propagated_back and dir_propagated_forward (dir_back, dir_forward?) where: dir_propagated_forward is "paragraph direction, if any, otherwise dir_propagated_forward of previous paragraph" dir_propagated_back is "paragraph direction, if any, otherwise, dir_propagated_back of next paragraph" There would be no need to search, and in fact, the logic in _gtk_text_btree_resolve_bidi() would get significantly simpler. Since we had to add a whole 32 bytes to add any extra fields, it really doesn't cost any more space to save three directions rather than just two. Another thing I saw in _gtk_text_btree_resolve_bidi() that made me wonder was the bounds for invalidation when the region extends to the end of the buffer - it looks like the last line will never get invalidated because of the 'line = prev; break;' I'm not sure if this a real problem or not ... is the last line even displayed? But if it isn't a problem, it needs to be explained in a comment. I don't want to get into line by line comments at the moment, but some general coding style notes: - Block comments should have a solid line of * down the left side.. - When breaking lines on operators, especially &&, ||, the style is that the operator goes on the previous line. (Yes, this is different from the GNU coding guidelines) - In a few cases, somehow the closing ) got moved to a separate line. (I think it's worth reading through patches before submission to make sure there aren't accidental whitespace changes in the patch. A couple of other thing I noticed ... In set_para_values(), the logic really looks overcomplicated and a bit jumbled. There should be only one call to pango_layout_new (layout->rtl_context) and one call to pango_layout_new (layout->ltr_context) not mulltipel copies of each. In gtk_text_layout_invalidate_cursor_line(), I don't see why gtk_text_layout_update_cursor_line() is called. Why would the cursor_line be invalid at that point?
Created attachment 15871 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 03/12/03 16:01
Oops! The last attachment should read "pango changes for auto bidi direction! and not gtk+-changes for auto bidi. (I wish there was a bugzilla undo mechanism!) In any case these changes are really nothing more than a straightforward application of all of Owens comments from 2003-04-01 17:49. Let me know if is ok to commit.
Created attachment 15873 [details] [review] A test implementation of line insertion and deletion algorithm in perl for testing.
I just added another attachement with a new algorithm for the bidi resolution in which I have split the dir_propagated_forward and dir_propagated_backward as suggested by Owen on 2003-04-02 10:56. Indeed this turns the algoritm into O(N) as Owen wrote. I wrote the algorithm in perl for my own testing before I actually go ahead and change the gtktextbuffer code. It is probably not useful for anyone else, except perhaps for describing the algorithm for someone writing a different widget implementation...
Created attachment 15899 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 03/16/03 17:04
I just submitted a new gtk+ patch that implements all Owens comments from 2003-04-02 10:56 (including the algorithm that I implemented in perl in the 04/21/03 8:57) attachement. The algorithm now works in O(N) as required by Owen. But, I believe there still is a problem with the patch, that I don't understand what it comes from. In testtext everything looks fine. But with gedit I get the following error message: (gedit:29097): Gtk-CRITICAL **: file gtktextbuffer.c: line 334 (set_table): assertion `buffer->tag_table == NULL' failed and in gimp-1.3.14 the text tool eventually crashes after I got several error messages: (gimp-1.3:29098): Gtk-CRITICAL **: file gtktextbuffer.c: line 1857 (gtk_text_buffer_get_mark): assertion `GTK_IS_TEXT_BUFFER (buffer)' failed (gimp-1.3:29098): Gtk-CRITICAL **: file gtktextbuffer.c: line 1794 (gtk_text_buffer_get_iter_at_mark): assertion `GTK_IS_TEXT_MARK (mark)' failed (gimp-1.3:29098): Gtk-CRITICAL **: file gtktextbuffer.c: line 1857 (gtk_text_buffer_get_mark): assertion `GTK_IS_TEXT_BUFFER (buffer)' failed (gimp-1.3:29098): Gtk-CRITICAL **: file gtktextbuffer.c: line 1794 (gtk_text_buffer_get_iter_at_mark): assertion `GTK_IS_TEXT_MARK (mark)' failed (gimp-1.3:29098): GLib-GObject-CRITICAL **: file gobject.c: line 1355 (g_object_get_qdata): assertion `G_IS_OBJECT (object)' failed I don't know exactly where these changes come from. But it is clear that it has to do in one way or another with the part of the patch that uses the cursor direction for bidi direction. I would very much appreciate help in debugging this. If I won't solve this, I will have to erase that part of the patch, which will significantly deteriorate functionality. 8-(
Dov, I think your problems are due to the mark_set being emitted after the layout has already been finalized. Adding g_signal_handlers_disconnect_by_func (layout->buffer, G_CALLBACK (gtk_text_layout_mark_set_handler), layout); g_signal_handlers_disconnect_by_func (layout->buffer, G_CALLBACK (gtk_text_layout_buffer_insert_text), layout); g_signal_handlers_disconnect_by_func (layout->buffer, G_CALLBACK (gtk_text_layout_buffer_delete_range), layout); to the if (layout->buffer) { } block in gtk_layout_set_buffer() should fix this (since gtk_layout_finalize() calls gtk_layout_set_buffer (layout, NULL))
Created attachment 18389 [details] [review] Pango changes for auto bidi dir. This patch replaces 04/21/03 02:24.
Created attachment 18390 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 04/22/03 07:06
I have just uploaded two patches against pango-1.2.3 and gtk+-2.2.2 that includes: 1. All the functionality of the previous patches. 2. A tiny bugfixes in pango/mini_fribidi 3. The bug fix from Matthias Clasen (Thanks!!!) As far as I am concerned there is now nothing that stops the inclusion of this patch in HEAD. (Note that the patch still does not implement the auto dir functionality for the entry widget. Should we carry over this bug for the entry widget functionality as well, or open a new bug?)
Created attachment 18576 [details] [review] gtk+-changes for auto bidi dir. This patch replaces 07/17/03 15:01
I just added a new gtk+ patch that in addition to the text widget functionality that was included before now also includes patches for gtkentry and gtklabel to implement the corresponding functionality for them. I believe this completes the functionality implementation for gtk+ .
I've split the GTK+ changes off into three separate bugs, since keeping track of everything on this bug report was getting confusing. I've spent some time playing around with this code, and have new versions of the patches, mostly cleaning up indentation and adding some comments. I'll post those soon. Some random thoughts about the Pango API: - The problem with PangoDirection is that it has lots of values, and it's not clear what values are valid where. For example, I believe that the weak directions are only really meaningful for the return value for pango_unichar_direction(), and that calling pango_context_set_direction (context, PANGO_DIRECTION_WEAK_RTL) would be non-sensical. - As it turns out, pango_find_base_direction() is the only part of the new API that actually gets used in GTK+... - From the GTK+ changes, it is pretty clear that it would be nice to have pango_layout_set_direction(); not a 1.4 item, but I filed bug 118456 on the subject. Summary of changes I made in the attached patch: - Removed unused prototype in pango-context.h - Added g_return_if_fail() to pango_find_base_dir - Fix bug in pango_find_base_dir() if length is not at a character boundary, and also (to conform to Pango convention), truncate at embedded \0 - Improve find_base_dir docs a bit, use % markup on %PANGO_DIRECTION_NEUTRAL. - Various indentation fixes - Treat PANGO_DIRECTION_TTB_RTL, PANGO_DIRECTION_TTB_LTR like PANGO_DIRECTION_LTR PANGO_DIRECTION_RTL respectively, rather than PANGO_DIRECTION_WEAK_LTR - Remove HAVE_FRIBIDI external-fallbacks support; all the rest of that is already gone. - Move PangoDirection docs into the header file, expand significantly. (Though I think now what I wrote is wrong in quite a few details.) - Move pango_get_mirror_char() pango_unichar_direction() (along with pango_get_mirror_char())into pango-types.h. pango-util.h is stuff is more randomly-exported internals, and I don't want to include it into GTK+ widgets.
Created attachment 18702 [details] [review] My version of Pango patch
Bug reference above for pango_layout_set_direction() should be bug 118546.
Created attachment 20705 [details] [review] Enhanced version of pango-patch that includes auto-dir
I have just uploaded an extended Pango patch (built on Owen's patch) that includes functionality for setting the bidi direction according to the contents of the layout. In addition, just like in gtk_text_layout it swaps the meaning of left and right alignment if the base direction of the paragraph is RTL. This patch is a prerequisite for solving #118541 (auto direction for labels) since a label is really nothing more than a place holder for a PangoLayout object.
Committed my changes, will look at your enhanced version next: Fri Feb 27 09:30:10 2004 Owen Taylor <otaylor@redhat.com> Add some new enum and values and utilities for supporting automatically determined base direction. (#70451, based on changes by Dov Grobgeld) * pango/pango-types.h docs/tmpl/main.sgml: Add PANGO_DIRECTION_WEAK_RTL/LTR, extend the docs for PangoDirection. * pango/pango-types.h pango/pango-utils.h: Move pango_get_mirror_char() to pango-types.h. * pango/mini-fribidi/fribidi.c (pango_log2vis_get_embedding_levels): Handle new values of PangoDirection, handle PANGO_DIRECTION_TTB_LTR/RTL as aliases for PANGO_DIRECTION_RTL/LTR. * pango/mini-fribidi/fribidi.c pango/pango-types.h: Add pango_unichar_direction(). * pango/pango-utils.c pango/pango-types.h: Add pango_find_base_dir()
My biggest concern here is the flipping of justification. Should it be? if (line->layout->auto_dir && line->resolved_dir != pango_context_get_base_dir (context)) (With some handling of the context's base direction possibly being weak) That seems more natural to me from an API standpoint, though the end result for the user will be about the same. I'm not completely convinced that the end result for the user is right here, consider: Team Assignment ==== ================================== Owen GtkFileChooser Jon Owen GNITIDE IDIB VOD All labels set to wrap. The label on the left would definitely be better off with all lines right aligned. Basically, when the a LTR programmer says "right aligned", they probably actually want right aligned, also for text that is naturally right aligned. On the other hand, when a left-to-right programmer says "left aligned", they probably just mean "displayed nicely". The labels on the right are arguably the way you would want them. So, I could actually make some argument that the correct flipping algorithm is: "If resolved direction is different from the baseline, and the text would be aligned at the leading edge, align it to the trailing edge" Of course, then that is awfully magic and confusing if it doesn't do what you mean. But I guess you can always turn auto direction off. Am I just confusing myself here or does the above make some sense? I'm going to attach a revised version of your patch with some cleanups and fixes, but hold off on committing it for the moment. * Changing pango_itemize() isn't possible; it's public API. So, added pango_itemize_with_base_dir() instead. * Default --auto-dir to true, so make the argument --no-auto-dir * Make resolved_dir field ofPangoLayout a bitfield and put it at the end to avoid binary compat breakage. * Canonicalize boolean argument and call pango_layout_clear_lines in pango_layout_set_auto_dir(). * Got rid of base_dir temporaries in pango-layout.c since they weren't making things any clearer any more. * Moved base_dir argument to process_line into ParaBreakState * Fixed a bunch of problems with missing spaces between function name and opening (.
Created attachment 24868 [details] [review] My version of auto-direction PangoLayout
Regarding the if(), I'm not sure whether the context base direction should be referenced at all. In my gtk_entry() and gtk_text_widget() I just silently ignored it alltogether. I think this all boils down to whether ALIGN_LEFT means "align according to the base direction" or "align left no matter what the base direction is". I believe that the the first interpretation is one that would cause the least breakage, and that is the interpretation that I have used for all of my auto-direction patches. You should clearly specify which interpretation you prefer. The if you suggest is consistant with the second interpretation above. One solution would be to introduce additional enums called ALIGN_WITH_BASE_DIRECTION or ALIGN_AGAINST_BASE_DIRECTION. Then, if someone someone specifies ALIGN_LEFT or ALIGN_RIGHT it would be taken literally. But this is probably exaggerating. In any case, I believe it is much more important having the patch in gtk than whether the alignment is one way or another. Regarding the example, you are of course right it would look nicer if VOD was right aligned, but I think implementing such logic would be too convoluted... Just trying to figure out how ALIGN_RIGHT of the latin labels should be interpreted in a RTL locale when the packing direction has been flipped, makes my mind flip...
I decided to go with the intermediate approach of "reverse alignment when resolved_dir != base_dir". I think this is more consistent with the behavior when !auto_dir than what you had, but is still reasonably simple. It also works with the current GtkLabel code, which already reverses the justification it passes to Pango. Fri Feb 27 11:23:21 2004 Owen Taylor <otaylor@redhat.com> Patch from Dov Grobgeld to add auto-direction to PangoLayout (more of #70451) * examples/pangofttopgm.c: Add --no-auto-dir argument. * pango-layout.[ch]: Add pango_layout_set_auto_dir() defaulting to TRUE; resolve paragraph direction within a layout by propagating base direction downwards from paragraph to paragraph. * pango-context.[ch]: Add pango_itemize_with_base_dir() which overrides the base direction from the PangoContext.
Wrt vertical bidi (comment #19), see Unicode Technical Note #22 http://www.unicode.org/notes/tn22/