After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 753772 - Justify option inserts spaces in wrong positions
Justify option inserts spaces in wrong positions
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: pango-maint
pango-maint
: 753859 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2015-08-18 18:36 UTC by Dov Grobgeld
Modified: 2015-08-23 18:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
A program generating justified text in a constrained space (1.70 KB, text/plain)
2015-08-18 18:36 UTC, Dov Grobgeld
Details
Output of the program pango-justify-bug. (1.31 KB, image/png)
2015-08-19 13:34 UTC, Dov Grobgeld
Details
Program rendering a Hebrew text with and without justification. (1.87 KB, text/x-csrc)
2015-08-19 21:27 UTC, Dov Grobgeld
Details
Annotated output of the previous c-program. (36.15 KB, image/png)
2015-08-19 21:29 UTC, Dov Grobgeld
Details
Animated gif showing the difference before and after the patch (130.88 KB, image/gif)
2015-08-22 19:47 UTC, Dov Grobgeld
Details

Description Dov Grobgeld 2015-08-18 18:36:08 UTC
Created attachment 309501 [details]
A program generating justified text in a constrained space

This is a continuation of issue #542706 from 7 years ago. In spite of what I wrote then, this bug was not properly solved. I'm attaching a similar example program that illustrates that the problem still exists, even with English. 

The example renders the same text twice. The first time with justification turned on and the second without. I would have expected that no space would be inserted within a word with latin characters, but still it is, as can be seen that the "k" and the "l" are further from one another than the non-justified part at the bottom.

I believe that this is the same problem that is seen with hebrew text which has random spaces inserted inside of a word.

An investigation I did a couple of years ago indicated that this might be related to glyphs->glyphs[i].attr.is_cluster_start being true for the second character.
Comment 1 Behdad Esfahbod 2015-08-18 23:28:56 UTC
Can you attach screenshots showing the problem please?  Thanks.
Comment 2 Dov Grobgeld 2015-08-19 13:34:37 UTC
Created attachment 309597 [details]
Output of the program pango-justify-bug.

(Sorry. I forgot to attach this file yesterday)

The screenshot shows the same text "kl ahd" (from "kol ehad" (each one) in Hebrew) rendered twice with pando in a very narrow space. At the top the text is layed out with justify and at the bottom without.

I would expect that the interglyph distance within a word not to be affected by whether we are justified or not. But this assumption is broken in the screenshot.

Again, this is a toy problem, and if needed I can generate a "real" problem happening when Hebrew text is justified.
Comment 3 Behdad Esfahbod 2015-08-19 13:46:26 UTC
That's working as expected.  If there are no space characters on the line, we use letterspacing to justify:

  http://mces.blogspot.co.uk/2007/05/justified-text-with-pango.html
Comment 4 Dov Grobgeld 2015-08-19 21:26:29 UTC
Ok. So the toy example is not related to the real problem.

Instead I am attaching a new version of the program that renders a longer Hebrew text that is layed out erroneously when justified.

I'm also attaching the program output that has been annotated with red circles around areas where spaces were wrongly inserted.
Comment 5 Dov Grobgeld 2015-08-19 21:27:40 UTC
Created attachment 309644 [details]
Program rendering a Hebrew text with and without justification.
Comment 6 Dov Grobgeld 2015-08-19 21:29:00 UTC
Created attachment 309645 [details]
Annotated output of the previous c-program.

The red circles indicate areas where spaces were wrongly inserted.
Comment 7 Behdad Esfahbod 2015-08-20 09:58:21 UTC
In general, much easier to reproduce if you paste pango-view commands to show a bug.

Can you check if this was working before Khaled's two recent changes?
Comment 8 Behdad Esfahbod 2015-08-20 10:23:52 UTC
*** Bug 753859 has been marked as a duplicate of this bug. ***
Comment 9 Dov Grobgeld 2015-08-20 11:57:45 UTC
Behdad,

Next time I promise to use pango-view. Meanwhile I'm sure you'll manage to copy and paste the compilation command at the top of the c-file. :-) But perhaps there are optional debug options in pango that pango-view turns on?

I'll check out whether Khaled's changes have anything to do it later today, but my gut feeling is that it is the same bug that I reported and retracted back in 542706 .
Comment 10 Behdad Esfahbod 2015-08-20 12:09:31 UTC
Here's a pango-view command for your test:

$ pango-view --text "כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה" --width=600 --margin=10   --justify --font="serif 40"

It works fine for me.  Does it reproduce for you with this command?
Comment 11 Dov Grobgeld 2015-08-20 12:50:52 UTC
After playing around a bit with pango-view did I realize that the problem does not always reproduce. But the problem did appear when doing:

$ pango-view --text "כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה" --width=360 --margin=10   --justify --font="DejaVu Serif 20"

But not when doing:

$ pango-view --text "כל בני האדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחווה" --width=360 --margin=10   --justify --font="serif 20"

What is strange is that fc-match lists DejaVu Serif as its first match:

$ fc-match serif --all | head
DejaVuSerif.ttf: "DejaVu Serif" "Book"
:
Comment 12 Dov Grobgeld 2015-08-20 12:53:34 UTC
Could it possibly be that pango is using metrics for one font during layout, but then using another font for the rendering? How can I check this?
Comment 13 Behdad Esfahbod 2015-08-20 15:11:42 UTC
What's happening probably is that the default font for Hebrew is something else.  What do you get for fc-match serf:lang=he?

That this is font-specific is a huge surprise...  Let me check the code....
Comment 14 Dov Grobgeld 2015-08-20 16:10:32 UTC
I just tested the fc-match self:lang=he three times while twice erasing the matching font. The result was that the glyph outlines changed, but they were the same between the two invocations of  above, and the spacing bug remained. This is what caused me to guess that there is a mismatch between the info supplied for the font metrics, and the glyphs actually used for the outline. The three matching fonts were:

$  fc-match serif:lang=he
HadasimCLM-Regular.ttf: "Hadasim CLM" "Regular"
#  Erased HadasimCLM-Regular.ttf

$  fc-match serif:lang=he
times.ttf: "Times New Roman" "Regular"
# Erased times.ttf

$ fc-match serif:lang=he
Tinos-Regular.ttf: "Tinos" "Regular"

Oops. I just realized that "DejaVu Serif" does not contain any Hebrew glyphs. So perhaps there are different fallback mechanisms for the font metrics and the glyph outlines?
Comment 15 Behdad Esfahbod 2015-08-21 10:31:30 UTC
Can you jump into justify_words() in pango-layout.c and see what's going on?
Comment 16 Dov Grobgeld 2015-08-21 11:56:59 UTC
A quick initial look give the impression that there is a mixup between the visually ordered clusters and the logical ordered log_attrs. I don't have more time to look at it now. Meanwhile the following is a more compact command line that triggers the problem:

 run --text "אאאאאאאאא. בבב גגגגגגגג" --width=160 --margin=10   --justify --font="DejaVu Serif 20"

(the problem disappears if the period is erased... Strange)
Comment 17 Behdad Esfahbod 2015-08-21 12:02:37 UTC
While pango-view is open, run pmap on the process and grep for fonts.  See which font is actually used...
Comment 18 Behdad Esfahbod 2015-08-21 12:02:57 UTC
Oh, I might know what's going on.  Dang.  Hold on.
Comment 19 Behdad Esfahbod 2015-08-21 12:03:39 UTC
Reproduced.
Comment 20 Behdad Esfahbod 2015-08-21 12:04:42 UTC
Working on fix...
Comment 21 Behdad Esfahbod 2015-08-21 12:44:08 UTC
Fixed a crasher, and this bug..  However, discovered a bug with justify_cluster and rtl, as well as multiple runs...

./pango-view --text "אאאאאאאאא. בבב abc גגגגגגגג.אאאאאאאאא. בaב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג.אאאאאאאאא. בבב גגגגגגגג." --width=160 --margin=10 --justify --font="DejaVu Serif 20"

working on that.
Comment 22 Behdad Esfahbod 2015-08-21 13:37:34 UTC
All fixed now.

commit 79c7874ad58d32c2b265c725d5e7e81504046595
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Fri Aug 21 14:36:20 2015 +0100

    Fix justify_clusters()
    
    Was totally b0rked.  Much better now.  Should be good enough...

commit bf500a961f0cbdb1f5b3ac9bb71080e7613df23d
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Fri Aug 21 13:35:32 2015 +0100

    Fix offset calculation for justify
    
    Bug 753772 - Justify option inserts spaces in wrong positions
    https://bugzilla.gnome.org/show_bug.cgi?id=753772
    
    Went unnoticed for 8 years...

commit cc3df43e9703cc4765358ae4e9e67c8199d5c37c
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Fri Aug 21 13:31:45 2015 +0100

    Fix crasher in justify code
    
    Ouch!
Comment 23 Dov Grobgeld 2015-08-21 14:04:36 UTC
Much much better! Thanks! But... If the last character in the line (first character on the left) is punctuation character, then that character is not left aligned to the left margin, but there a white space like a space character to the left of it.
Comment 24 Behdad Esfahbod 2015-08-21 14:13:34 UTC
Can't repro.  pango-view command or screenshot?
Comment 25 Behdad Esfahbod 2015-08-21 14:14:09 UTC
ok changing paragraph to ltr, everything breaks :))
Comment 26 Behdad Esfahbod 2015-08-21 15:28:26 UTC
Ok, please try again...
Comment 27 Dov Grobgeld 2015-08-22 19:47:37 UTC
Created attachment 309871 [details]
Animated gif showing the difference before and after the patch

Yeah! Now it is finally working as it should. The attached animated gif rendered from a pango markup version of the first few verses of "song of songs" shows the huge difference before and after the latest changes.

Now, if instead of inserting spaces we could widen certain characters like kashida in Arabic... ;-)

Thanks, Behdad.
Comment 28 Behdad Esfahbod 2015-08-23 17:42:32 UTC
Thanks Dov.  I don't know the script so can't appreciate the difference!  But true justification is part of my plan for the next few years!
Comment 29 Dov Grobgeld 2015-08-23 18:10:08 UTC
Suffice it to say that before the fix there were arbitrary spaces inserted in the middle of words, that have now been fix. The final results is really as good as it gets for for Hebrew.

Regarding kashidas, they are not used in modern Hebrew at all, but they are used in the handwritten bible scrolls and were used in old praying books. It would be a neat exercise to build a font and a the necessary justification logics to support it, but if nobody even complained that normal justification was broken in pandas for eight years, nobody is likely to use kashida logics. But this is all totally unrelated to this fixed bug.