GNOME Bugzilla – Bug 753772
Justify option inserts spaces in wrong positions
Last modified: 2015-08-23 18:10:08 UTC
Created attachment 309501 [details] A program generating justified text in a constrained space This is a continuation of issue #542706 from 7 years ago. In spite of what I wrote then, this bug was not properly solved. I'm attaching a similar example program that illustrates that the problem still exists, even with English. The example renders the same text twice. The first time with justification turned on and the second without. I would have expected that no space would be inserted within a word with latin characters, but still it is, as can be seen that the "k" and the "l" are further from one another than the non-justified part at the bottom. I believe that this is the same problem that is seen with hebrew text which has random spaces inserted inside of a word. An investigation I did a couple of years ago indicated that this might be related to glyphs->glyphs[i].attr.is_cluster_start being true for the second character.
Can you attach screenshots showing the problem please? Thanks.
Created attachment 309597 [details] Output of the program pango-justify-bug. (Sorry. I forgot to attach this file yesterday) The screenshot shows the same text "kl ahd" (from "kol ehad" (each one) in Hebrew) rendered twice with pando in a very narrow space. At the top the text is layed out with justify and at the bottom without. I would expect that the interglyph distance within a word not to be affected by whether we are justified or not. But this assumption is broken in the screenshot. Again, this is a toy problem, and if needed I can generate a "real" problem happening when Hebrew text is justified.
That's working as expected. If there are no space characters on the line, we use letterspacing to justify: http://mces.blogspot.co.uk/2007/05/justified-text-with-pango.html
Ok. So the toy example is not related to the real problem. Instead I am attaching a new version of the program that renders a longer Hebrew text that is layed out erroneously when justified. I'm also attaching the program output that has been annotated with red circles around areas where spaces were wrongly inserted.
Created attachment 309644 [details] Program rendering a Hebrew text with and without justification.
Created attachment 309645 [details] Annotated output of the previous c-program. The red circles indicate areas where spaces were wrongly inserted.
In general, much easier to reproduce if you paste pango-view commands to show a bug. Can you check if this was working before Khaled's two recent changes?
*** Bug 753859 has been marked as a duplicate of this bug. ***
Behdad, Next time I promise to use pango-view. Meanwhile I'm sure you'll manage to copy and paste the compilation command at the top of the c-file. :-) But perhaps there are optional debug options in pango that pango-view turns on? I'll check out whether Khaled's changes have anything to do it later today, but my gut feeling is that it is the same bug that I reported and retracted back in 542706 .
Here's a pango-view command for your test: $ pango-view --text "כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה" --width=600 --margin=10 --justify --font="serif 40" It works fine for me. Does it reproduce for you with this command?
After playing around a bit with pango-view did I realize that the problem does not always reproduce. But the problem did appear when doing: $ pango-view --text "כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה" --width=360 --margin=10 --justify --font="DejaVu Serif 20" But not when doing: $ pango-view --text "כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה" --width=360 --margin=10 --justify --font="serif 20" What is strange is that fc-match lists DejaVu Serif as its first match: $ fc-match serif --all | head DejaVuSerif.ttf: "DejaVu Serif" "Book" :
Could it possibly be that pango is using metrics for one font during layout, but then using another font for the rendering? How can I check this?
What's happening probably is that the default font for Hebrew is something else. What do you get for fc-match serf:lang=he? That this is font-specific is a huge surprise... Let me check the code....
I just tested the fc-match self:lang=he three times while twice erasing the matching font. The result was that the glyph outlines changed, but they were the same between the two invocations of above, and the spacing bug remained. This is what caused me to guess that there is a mismatch between the info supplied for the font metrics, and the glyphs actually used for the outline. The three matching fonts were: $ fc-match serif:lang=he HadasimCLM-Regular.ttf: "Hadasim CLM" "Regular" # Erased HadasimCLM-Regular.ttf $ fc-match serif:lang=he times.ttf: "Times New Roman" "Regular" # Erased times.ttf $ fc-match serif:lang=he Tinos-Regular.ttf: "Tinos" "Regular" Oops. I just realized that "DejaVu Serif" does not contain any Hebrew glyphs. So perhaps there are different fallback mechanisms for the font metrics and the glyph outlines?
Can you jump into justify_words() in pango-layout.c and see what's going on?
A quick initial look give the impression that there is a mixup between the visually ordered clusters and the logical ordered log_attrs. I don't have more time to look at it now. Meanwhile the following is a more compact command line that triggers the problem: run --text "אאאאאאאאא. בבב גגגגגגגג" --width=160 --margin=10 --justify --font="DejaVu Serif 20" (the problem disappears if the period is erased... Strange)
While pango-view is open, run pmap on the process and grep for fonts. See which font is actually used...
Oh, I might know what's going on. Dang. Hold on.
Reproduced.
Working on fix...
Fixed a crasher, and this bug.. However, discovered a bug with justify_cluster and rtl, as well as multiple runs... ./pango-view --text "אאאאאאאאא. בבב abc גגגגגגגג.אאאאאאאאא. בaב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג." --width=160 --margin=10 --justify --font="DejaVu Serif 20" working on that.
All fixed now. commit 79c7874ad58d32c2b265c725d5e7e81504046595 Author: Behdad Esfahbod <behdad@behdad.org> Date: Fri Aug 21 14:36:20 2015 +0100 Fix justify_clusters() Was totally b0rked. Much better now. Should be good enough... commit bf500a961f0cbdb1f5b3ac9bb71080e7613df23d Author: Behdad Esfahbod <behdad@behdad.org> Date: Fri Aug 21 13:35:32 2015 +0100 Fix offset calculation for justify Bug 753772 - Justify option inserts spaces in wrong positions https://bugzilla.gnome.org/show_bug.cgi?id=753772 Went unnoticed for 8 years... commit cc3df43e9703cc4765358ae4e9e67c8199d5c37c Author: Behdad Esfahbod <behdad@behdad.org> Date: Fri Aug 21 13:31:45 2015 +0100 Fix crasher in justify code Ouch!
Much much better! Thanks! But... If the last character in the line (first character on the left) is punctuation character, then that character is not left aligned to the left margin, but there a white space like a space character to the left of it.
Can't repro. pango-view command or screenshot?
ok changing paragraph to ltr, everything breaks :))
Ok, please try again...
Created attachment 309871 [details] Animated gif showing the difference before and after the patch Yeah! Now it is finally working as it should. The attached animated gif rendered from a pango markup version of the first few verses of "song of songs" shows the huge difference before and after the latest changes. Now, if instead of inserting spaces we could widen certain characters like kashida in Arabic... ;-) Thanks, Behdad.
Thanks Dov. I don't know the script so can't appreciate the difference! But true justification is part of my plan for the next few years!
Suffice it to say that before the fix there were arbitrary spaces inserted in the middle of words, that have now been fix. The final results is really as good as it gets for for Hebrew. Regarding kashidas, they are not used in modern Hebrew at all, but they are used in the handwritten bible scrolls and were used in old praying books. It would be a neat exercise to build a font and a the necessary justification logics to support it, but if nobody even complained that normal justification was broken in pandas for eight years, nobody is likely to use kashida logics. But this is all totally unrelated to this fixed bug.