GNOME Bugzilla – Bug 145275
Font selection for space characters
Last modified: 2009-10-02 20:12:15 UTC
To reproduce: 1. Log into gnome on Japanese locales. 2. Invoke gedit. 3. Input 'Ctrl + Space' and change to Kanji-mode. 4. Input 'Space' key. Then the char is corrupt
Created attachment 29143 [details] [review] pango-context.c patch patch for head
Can you be more precise than "corrupt"?
the undefined glyph is getting displayed when spaces only are input, eg, Unicode 0x3000, ja space, comes in a square box with 3000 inside.
The root cause is as same as 138171's. As full width character, 0x3000 should be specified to belong to PANGO_SCRIPT_HAN in pango-script-table.h to make sure that one CJK font is selected to render the character. Otherwise, the problem will happen in CJK locale as pango tries to use latin font instead. Anyway, attach one simple patch while you said the header file isn't editable.
Created attachment 29453 [details] [review] one simple patch
I'm not really sure how the patch would make a difference ... I think the issue here is in itemize_state_process_run(): if (!g_unichar_isgraph (wc)) { shape_engine = NULL; font = NULL; } The shape engine and font for non-printing characters come from surrounding text and ignore whether the font has the character at all. Which is typically right: if someone uses THREE-PER-EM SPACE, you don't want to load some random font in the system that happens to have this character, you just want to skip 1/3 em of space. I'm not sure about IDEOGRAPHIC SPACE ... whether we should give it the same treatment or whether we should actually look for a font that supports the character. But in any case, for any character that we do use the above handling for, we need to make sure that it renders as a space for all fonts.
You are right. No much difference between the two patches while my patch is only applied to the U+3000. Using g_unichar_isprint would be better. If IDEOGRAPHIC SPACE is rendered as full-width space in CJK locale, either just skip a fullwidth space or look for a font that has the character would be OK. But i am not sure whether any non-CJK locale will use IDEOGRAPHIC SPACE, if i input it with the hexcode input method in en_US.UTF-8, what happen?
Created attachment 31535 [details] [review] My suggested patch After thinking about it some, I think for U+3000, just making it go through the printing-glyph codepath is right. While for a M/3 space or whatever, skipping 1/3 of an em is right, for an ideographic space, we really want to get an ideographic space that match the ideographic font for the glyph. There's still a remaining component of this, which is to implement fallback rendering for other spaces.
Mon Sep 13 10:18:51 2004 Owen Taylor <otaylor@redhat.com> * pango/pango-context.c (itemize_state_process_run): Except ideographic spaces from the !g_unichar_isgraph() test. (#145275, Federic Zhang)
*** Bug 164630 has been marked as a duplicate of this bug. ***
*** Bug 311473 has been marked as a duplicate of this bug. ***
I actually went on and committed a patch similar to the first patch in this bug, so space characters go through the same path as graphic characters now. That works better in the meantime, until we implement what this bug is about. This is in Pango 1.14.2.
I take it that the point of this change is to support fonts that have random junk at space codepoints for MathML fonts? It doesn't seem to me that you want fallback in that case ... if the exactly specified font has something for THREE-PER-EM space, sure using that glyph is probably fine. But searching all fonts on the system for something that happens to have a glyph for THREE-PER-EM SPACE seems very wrong to me.
(In reply to comment #13) > I take it that the point of this change is to support fonts that have random > junk at space codepoints for MathML fonts? No no. Not at all. It's just that the MathML is using the cmex font, and that font doesn't have a glyph for U+2009 THIN SPACE, but the tests were using that for something like $\int f(x)\thinspace dx$. Without this change, the space was assigned to cmex and showing up as a hexbox. With this change, it's being picked up from DejaVu LGC Sans which is a pretty good font as far as coverage goes. And it really looks good. If you have fonts that have very different metrics, you are screwed anyway. > It doesn't seem to me that you want fallback in that case ... if the exactly > specified font has something for THREE-PER-EM space, sure using that glyph > is probably fine. > > But searching all fonts on the system for something that happens to have > a glyph for THREE-PER-EM SPACE seems very wrong to me. Having a good default font like DejaVu is enough to guarantee an improved experience with the patch. That's why I thought it's better to have it rather than not. If you still prefer to not have something like this in Pango, I'll go on and implement the correct behavior, but that requires changes in so many places in individual backends and possibly needs an allocation scheme in the glyph space (say 0x02000001 is the synthesized THIN SPACE, 0x02000002 is ...) that I'm not quite ready to do just yet.
OK, sounds plausible. As far as I can remember, my concerns with using the standard shaping for spaces were: - misc-fixed or similar bitmap fonts were the most comprehensive fonts on many systems at that point. Using a non-scalable bitmap for a space doesn't work well. (Every scalable font has the same with for THREE-FOR-EM, but not bitmap fonts.) - Excess item breaks and shaper changes cause considerable inefficiency. I don't think you'd actually need glyph space allocation though. Either: * Use PANGO_GLYPH_EMPTY, but then all shapers need to be changed to call some sort of shape_space() function. * Just piggyback off of the missing glyph infrastructure. It even makes some sense ... drawing a hex box for THIN SPACE is just silly ... we know how to draw (or well, not draw) something better than that for THIN SPACE.
*** Bug 349428 has been marked as a duplicate of this bug. ***
(In reply to comment #15) > * Just piggyback off of the missing glyph infrastructure. It even makes > some sense ... drawing a hex box for THIN SPACE is just silly ... we know > how to draw (or well, not draw) something better than that for THIN SPACE. Yeah, this makes sense. We just need to change the backend glyph_extents and render functions that switches on the UNKNOWN glyphs. Makes perfect sense after I move the glyph_extents from pangocairo-fcfont.c into pangocairo-font.c
*** Bug 416526 has been marked as a duplicate of this bug. ***
Any update on this?
This is mostly doe these days in that we search for a font that covers the space character now (except for the ASCII space). The remaining issue is the same as in bug 63633. *** This bug has been marked as a duplicate of 63633 ***
I confirmed the original bug is fixed in the latest pango. I'll remove the Sun patch pango-01-fullwidth-space.diff.
I'm not convinced this is completely fixed. In particular, I am still seeing spaces that are too tall: font=Sans 10 U200a --> 1024x17408 U2009 --> 3072x17408 U2008 --> 4096x17408 U0020 --> 4096x16384 U2006 --> 2048x17408 I don't think the unusual ones should be taller than U2000.
If your main font doesn't cover them, we search fallback fonts. That's all. Please file a new bug. The original bug is fixed.
Behdad: I did -- bug 416526 -- but you duped it to this one.
Ok, reopened that one.