GNOME Bugzilla – Bug 589113
Some characters rotated incorrectly in vertical text
Last modified: 2009-08-03 20:27:36 UTC
Please describe the problem: When displaying vertical text some charactes are rotated incorrectly. The characters are: The block U+2FF0 - U+2FFB Ideographic Description Characters (only in some special circumstances) The character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK (only in some special circumstances) The fullwidth characters in the block U+FF01 - U+FFEE HalfWidth and FullWidth Forms (only some of the characters are affected) Steps to reproduce: 1. Display some text that includes the listed characters in a vertical layout. Actual results: Some of the characters are rotated incorrectly. Expected results: All characters rotated correctly. Does this happen every time? Yes Other information:
Created attachment 138794 [details] Incorrect rendering
Created attachment 138797 [details] Correct rendering
Created attachment 138801 [details] [review] This solves the problem, but I don't know if it is the best way to solve it.
I should use g_unichar_iswide() I guess. It wouldn't handle U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK though. Not sure why that one should not be rotated? I think it should be rotated, but the font should have a GSUB rule to use a better glyph instead.
Yes that is probably a good way to solve it. The example text (大村ー!) is from a japanese cartoon where it is printed vertically so I am quite sure it is correct. Pango handles it correctly if it appers after a hiragana- or katakana-character, but for some reason not when it is after a han-character. I think the rules should be like the following: Base gravity Katakana-hiragana prolonged sound mark PANGO_GRAVITY_SOUTH -> PANGO_GRAVITY_SOUTH PANGO_GRAVITY_EAST -> PANGO_GRAVITY_SOUTH PANGO_GRAVITY_WEST -> PANGO_GRAVITY_NORTH PANGO_GRAVITY_NORTH -> PANGO_GRAVITY_NORTH
Ok, I'll let you know when I have something for testing. I prolly should also implement ambiguous-width handling. http://www.unicode.org/reports/tr11/
I found out that no rotation is actually neccessary. The problem occurs because the katakana-hiragana prolonged sound mark is of PANGO_SCRIPT_COMMON. And then it just gets assigned the script that is around it. This then results in that the opentype tag "kana" is not applied when finding a glyph. And this results in a horizontal glyph being used. I also noticed a some other characters that have similar problems. I think that the best way to solve it would be to return either PANGO_SCRIPT_KATAKANA or to PANGO_SCRIPT_HIRAGANA from pango_script_for_unichar() when it is passed one of these characters. This would also eliminate the need for some of the special katakana handling in pango_default_break().
Created attachment 138896 [details] [review] Patch to fix the prolonged sound mark This patch solves the prolonged sound mark as well as some other katakana-hiragana-characters by treating them as katakana.
Created attachment 138903 [details] [review] Patch to fix the prolonged sound mark Sorry, I created the previous patch in a bad way. This one should be easier to use.
Changing pango_script_for_unichar() is not feasible.
Created attachment 138942 [details] [review] Patch to fix the incorrectly rotated characters This does not change anything in pango_script_for_unichar(), but still fixes the rotation of the prolonged sound mark and the wide letters. But it treats the ambiguous width characters as narrow. I will probably try to implement handling of the ambiguous characters next week, unless it has been implemented by then.
Thanks Jakob! The iterator is very close to what I had in mind. I'm not sure about the special-casing of those characters however. Lets deal with the two issues separately.
Can you attach the test text for the screenshots please?
Ok, committed the bulk of the patch. Another problem right now is that common characters (say, Latin digits) around CJK chars get the CJK script and hence shown vertical, while they shouldn't. I wonder whether I can abuse the width property to detect that. But I'm afraid non-CJK vertical scripts will suffer then.
Reading the Unicode report again, seems like the recommendation is to not do anything special for ambiguous-width characters in display (we do handle them specially in gnome-terminal though). So, remaining issue is how to handle Script=Common characters reliably.
I still think U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK should be handled by the font's vrt2 GSUB feature.
Sorry, 'vert', not 'vrt2'.
Ok, lets close this now. The Common characters are bug 410169.
The text used for the screenshots was "CDEはJimの会社だ\n大村ー!\n明道\n⿰⿺". By the way, I am glad you liked the implementation.
The prolonged sound mark is not fixed. I have just built the version from git://git.gnome.org/pango and then tested it with the following command: pango-view "--text=おおむらー オオムラー 大村ー" --rotate=270 --gravity=east --font=32 The last prolonged sound mark is not displayed correctly. It should instead be displayed with a vertical glyph like the first two. It may be fixed by changing the file pango/pango-script.c as in my last patch. But apart from the prolonged sound mark it works fine.
Jakob, I understand what you are saying, but I think the rest is font bugs. Or rather, it's a general shortcoming of the OpenType shaping model. It definitely should not be fixed by patching pango-script.c.
Yes, you are right, it should not be fixed in pango-script.c. But it is still a bug that horizontal glyphs are used for wide vertical text, when the font has vertical glyphs (I have tested with Sazanami Mincho which has).