GNOME Bugzilla – Bug 89541
Iteration by characters, clusters broken
Last modified: 2006-01-03 09:41:52 UTC
Using PangoLayoutIter to iterate by characters or clusters doesn't work at all currently, due to a variety of problems, in particular: - Confusion over whether indices are - Problems with figuring out what the end index is for a rtl run. (The end index for the ltr run "abc" is 3 - 0,1,2 are character indices. But for a rtl run CBA the character indices are 2,1,0 and currently, the end index is 0 as well, which doesn't work.)
Created attachment 10153 [details] Start of a test program
*** Bug 118786 has been marked as a duplicate of this bug. ***
*** Bug 120891 has been marked as a duplicate of this bug. ***
Fixed the test case in 120891. Fri Oct 31 12:32:38 2003 Owen Taylor <otaylor@redhat.com> Fix one problem with iteration by chars (Part of #89541, Mariano Suárez-Alvarez) * Pango/pango-layout.c (cluster_end_index): Fix to be item relative, like iter->cluster_index. * pango/pango-layout.c (pango_layout_iter_next_char): Adapt.
*** Bug 125990 has been marked as a duplicate of this bug. ***
*** Bug 126832 has been marked as a duplicate of this bug. ***
*** Bug 127331 has been marked as a duplicate of this bug. ***
*** Bug 127481 has been marked as a duplicate of this bug. ***
*** Bug 127650 has been marked as a duplicate of this bug. ***
*** Bug 127728 has been marked as a duplicate of this bug. ***
*** Bug 128445 has been marked as a duplicate of this bug. ***
*** Bug 130459 has been marked as a duplicate of this bug. ***
*** Bug 130501 has been marked as a duplicate of this bug. ***
*** Bug 134196 has been marked as a duplicate of this bug. ***
Major amount of work. First step is probalby coming up with test cases.
*** Bug 149050 has been marked as a duplicate of this bug. ***
*** Bug 149202 has been marked as a duplicate of this bug. ***
*** Bug 156997 has been marked as a duplicate of this bug. ***
This still seems to be a crasher for bug 154148, despite bug 120891 being resolved. The crasher in bug 154148 practically disables using Nautilus to manage Hebrew-named files. I'll try to come up with testcases next week.
It makes nautilus to crash a lot. see bug 149759 or bug 149202. Because it's a crasher, the severity should change to critical, and the priority to high. I think this bug should be solved before 2.10 goes out. Thanks, Yuval
Folks - the bug in general here is a huge amount of ill-defined work. If you have *particular* crashers, *particular* test cases would be extremely useful.
Created attachment 36370 [details] Simple testcase Particular testcase? You've got it. Iterates over a simple string of "<HEBREW ALEPH><HEBREW BET><HEBREW GIMEL>" (first three letters in the Hebrew alphabet). Crashes on Pango from CVS. (BTW, if you shorten the testcase's string to just "<ALEPH><BET>", it'll crash due to a different assertion. Shorten it to just "<ALEPH>", and it'll crash due to an even different one.)
Created attachment 37729 [details] [review] bidi support for PangoLayoutIter I have made this patch, which seems to solve the probs on my machine. It should be tested with other forms of text (including Arabic, and also Latin with combining marks). Please read detailed documentation in http://live.gnome.org/PangoLayoutIterBidiSupport
Minding this is a known crasher with a patch, is there any work being done to review this (now that GNOME 2.10 release date is over)?
Raising priority on this bug-report since it seems to be a problem that needs to be solved and it has a nicely documented patch as well as a test-case attached.
Created attachment 46068 [details] [review] Patch regenerated with diff -up I've regenerated the patch using 'diff -up'. A default format diff is almost impossible to read, so you should always use 'diff -u'.
Comment on attachment 46068 [details] [review] Patch regenerated with diff -up I've commented extensively on the Wiki page about how I think the iteration should work, and given some thoughts about implementation technique. I'm not going to comment on the current patch in detail since I don't think it quite implements the right thing though. Let me know if you are going to have time to work on implementing what I sketched out. Otherwise, I'll try to find some time to do it myself.
Very little time for at least one week. You'd probably do a better job, but if your'e busy and willing to wait ~2 weeks, I'll give it a go (your sketched plan seems good). Please have a look at my wiki updates, to make sure I understand correctly. I might decide to have a look at the code during my vacation this weekend.
Created attachment 47014 [details] [review] reimplement char & cluster iteration please review. I also have a test program which I'll upload after some cleanup
Created attachment 47113 [details] test program for PangoLayoutIter checks iteration by character. verifies x coords and total number of iterations. todo: * add support for \r\n * consider alternatives for using gtk * other test texts (read from file)
*** Bug 305493 has been marked as a duplicate of this bug. ***
I don't know if it is needed, but I felt it is important to report that the patch fixes the nautilus crash with arabic filenames I have been experiencing. Thanks.
*** Bug 307977 has been marked as a duplicate of this bug. ***
*** Bug 308158 has been marked as a duplicate of this bug. ***
*** Bug 308267 has been marked as a duplicate of this bug. ***
*** Bug 304414 has been marked as a duplicate of this bug. ***
Created attachment 48135 [details] [review] Patch I'm applying to head Here's what I'm applying to Pango HEAD. If no problems turn up in the next few days, I'll backport to the pango-1-8 branch.
Notes on differences: Fixed various occurrences of ++i Fixed various usages of gint, gchar to int, char Rewrote update_cluster() logic; next_logical_cluster_start_rtl(), run_glyph_to_index() were confusing me because they didnt' have a meaning that could be described in less than the amount of code which was in them. In update_cluster(), glyphs->num_glyphs() can never be zero, so removed check and comment. Renamed textp to text, pcluster to cluster_text. In pango_layout_iter_next_char() eliminated inline preincrement. The need for the tolerance of 1 in checking cluster extents in the test case actually was a real problem: consider a 3 character cluster with width 10. With was there before, you'd get, for the three clusters (0, 3) (3, 6) (6, 9), while the right result is (0, 3) (3, 6) (6, 10) [keeping the current truncation behavior]: I rewrote pango_layout_iter_get_char_extents to compute the left and right boundaries of the cluster. The need for "extra_eol" in the test case actually was a real problem; next_char() and next_cluster() should always go to the next char resp. next cluster when there are more chars/clusters, and should not include the fake end-of-line runs. Fixing this up was quite a bit of work, and I probably broke something... I reworked the test case a fair bit: - Made it use Cairo instead of GTK+ for creating a context - Clean up the use of varargs macros which aren't portable - Add a simple test for cluster iteration 2005-06-21 Owen Taylor <otaylor@redhat.com> Fix up the operation of PangoLayoutIter, especially for Bidi (#89541, based on a patch from Amit Aronovitch) * pango/pango-layout.c: Many changes to make iteration consistently in visual order. * pango/pango-layout.c (pango_layout_iter_next_char): Iterate through each character in the layout exactly once. (Including a hack to get two iterator positions for \r\n) * pango/pango-layout.c (pango_layout_iter_next_cluster): Only iterate through real clusters: that is, positions in the layout that have glyphs. * tests/testiter.c tests/Makefile.am: Add a (somewhat reworked) test from Amit for the operation of PangoLayoutIter.
*** Bug 310092 has been marked as a duplicate of this bug. ***
Committed to pango-1-8 as well.
*** Bug 325583 has been marked as a duplicate of this bug. ***