GNOME Bugzilla – Bug 378001
Fix Thai and Lao shaping for minority languages
Last modified: 2012-08-18 17:49:55 UTC
Please describe the problem: Minority languages that use Thai and Lao script often use sequences not currently supported by the Thai/Lao shaper. The enclosed patch aims to fix the state tables for known problems. A stronger solution would be to change the engine to allow any distinguishable rendered difference to be stored and rendered. I do not feel that a rendering engine is a place to enforce spelling conventions. Steps to reproduce: 1. Enter any of the sequences listed in http://scripts.sil.org/ThaiLaoSeq and notice that diacritics are not associated with base characters. โฺอฺ มูํย มูํย ลฺือ แต็่ง เจฺ่ง เจฺ่อ เปรฺิ่ห์ โจ๊่ เปฺี่ย โฺทร ຣຽໍງ ພໍ້ຽກ ພ້ຽໍກ ບຽູ ປ້ຽານ ກ້໋ານ ກັ່໋ງ Actual results: Expected results: Does this happen every time? Other information:
Created attachment 77010 [details] [review] patch to modules/ to fix Thai/Lao shaping
Thep, can you review this please.
Without looking deeply in the patch, I'd like to explain why the sequence check is there in the rendering engine. In case of Thai, you may know combining characters are not arbitrarily stacked. Tone marks need smaller size when stacked over upper vowel than when directly stacked over base consonant, for typographical quality, for example. Moreover, distinguishing invalid sequences to human eyes makes the errors perceivable and allows search misses to be prevented. There are many reasons behind this filtering. So, I may not agree with the "stronger solution". While minority scripts support is an important feature, covering them should not interfere with current features for majority scripts. The case of minority scripts is recognized by WTT 2.0 implementers, but no action has been taken so far, because we lack information. It's good to have feedback from an expert like this. These cases are obvious, and OK to fix: โฺอฺ มูํย ลฺือ แต็่ง เจฺ่ง เจฺ่อ เปรฺิ่ห์ เปฺี่ย โฺทร But I have question about this: โจ๊่ Does double tone marks apply over upper vowel as well? If so, we may need specially designed fonts to render it. For the case of Lao, I had better consult a native speaker for comment. And, after all, the state machine needs to be fixed in other places than pango as well. IM module for GTK+, for example, also needs synchronization, as well as SCIM, etc. It may be time to consider a shared library design.
Sak, would you have any comment about this?
In this case: โจ๊่, the mai trii is acting as a vowel so the dual tone sequence does not occur over another vowel. This isn't covered by the existing state machine model because it involves looking at 3 characters to decide if a break is necessary. As to whether it constitutes an illegal sequence to have something that a font may not be designed to handle well. I would suggest that it is not the place of an engine like pango to say that a sequence is illegal and therefore cannot be rendered at all, just because it may not render well. Just because a font may not be designed to render mai trii mai ek beautifully, does not mean that it should not be renderable in *any* font. We need to come up with a sensible set of limitations and requirements on a font that will allow as much as possible to be rendered. Fonts may then be created that provide better support for some languages than others. This is only to be expected. As to whether a shaping engine should do spell checking (in effect), I would suggest it is not. Qz never occurs in English, but it is not the shaping engine's job to highlight the fact. It is the job of a spell checker. Of the sequences presented, the one that worries me the most is: ກັ່໋ງ which in Thai would be กั๋่ง An OT shaping would be able to handle this, but a dumb font based shaper would probably need to mark this as illegal.
How does this all fit in new HarfBuzz's rendering?
We've merged the HarfBuzz branch. Closing obsolete.