After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 145275 - Font selection for space characters
Font selection for space characters
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: High normal
: Medium fix
Assigned To: pango-maint
pango-maint
: 164630 311473 349428 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2004-07-01 22:03 UTC by suresh
Modified: 2009-10-02 20:12 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
pango-context.c patch (1.22 KB, patch)
2004-07-01 22:06 UTC, suresh
none Details | Review
one simple patch (549 bytes, patch)
2004-07-12 03:34 UTC, federic zhang
none Details | Review
My suggested patch (1.81 KB, patch)
2004-09-13 19:05 UTC, Owen Taylor
committed Details | Review

Description suresh 2004-07-01 22:03:19 UTC
To reproduce:
1. Log into gnome on Japanese locales.
2. Invoke gedit.
3. Input 'Ctrl + Space' and change to Kanji-mode.
4. Input 'Space' key.

Then the char is corrupt
Comment 1 suresh 2004-07-01 22:06:05 UTC
Created attachment 29143 [details] [review]
pango-context.c patch

patch for head
Comment 2 Owen Taylor 2004-07-02 06:34:14 UTC
Can you be more precise than "corrupt"?
Comment 3 suresh 2004-07-02 07:25:13 UTC
the undefined glyph is getting displayed when spaces only are input, eg, Unicode
0x3000, ja space, comes in a square box with 3000 inside.

Comment 4 federic zhang 2004-07-12 03:33:34 UTC
The root cause is as same as 138171's. As full width character, 0x3000 should be
specified to belong to PANGO_SCRIPT_HAN in pango-script-table.h to make sure
that one CJK font is selected to render the character. Otherwise, the problem
will happen in CJK locale as pango tries to use latin font instead. Anyway,
attach one simple patch while you said the header file isn't editable.
Comment 5 federic zhang 2004-07-12 03:34:52 UTC
Created attachment 29453 [details] [review]
one simple patch
Comment 6 Owen Taylor 2004-07-26 20:38:06 UTC
I'm not really sure how the patch would make a difference ... I think the
issue here is in itemize_state_process_run():

      if (!g_unichar_isgraph (wc))
        {
          shape_engine = NULL;
          font = NULL;
        }

The shape engine and font for non-printing characters come from surrounding
text and ignore whether the font has the character at all. Which is typically
right: if someone uses THREE-PER-EM SPACE, you don't want to load some
random font in the system that happens to have this character, you just
want to skip 1/3 em of space.

I'm not sure about IDEOGRAPHIC SPACE ... whether we should give it the
same treatment or whether we should actually look for a font that
supports the character. But in any case, for any character that we
do use the above handling for, we need to make sure that it renders
as a space for all fonts.
Comment 7 federic zhang 2004-07-29 07:21:18 UTC
You are right. No much difference between the two patches while my patch is only
applied to the U+3000. Using g_unichar_isprint would be better.

If IDEOGRAPHIC SPACE is rendered as full-width space in CJK locale, either just
skip a fullwidth space or look for a font that has the character would be OK.
But i am not sure whether any non-CJK locale will use IDEOGRAPHIC SPACE, if i
input it with the hexcode input method in en_US.UTF-8, what happen?
Comment 8 Owen Taylor 2004-09-13 19:05:00 UTC
Created attachment 31535 [details] [review]
My suggested patch

After thinking about it some, I think for U+3000, just making it
go through the printing-glyph codepath is right.

While for a M/3 space or whatever, skipping 1/3 of an em is right, 
for an ideographic space, we really want to get an ideographic space 
that match the ideographic font for the glyph.

There's still a remaining component of this, which is to implement
fallback rendering for other spaces.
Comment 9 Owen Taylor 2004-09-20 17:34:58 UTC
Mon Sep 13 10:18:51 2004  Owen Taylor  <otaylor@redhat.com>
 
        * pango/pango-context.c (itemize_state_process_run):
        Except ideographic spaces from the !g_unichar_isgraph() test.
        (#145275, Federic Zhang)
Comment 10 Owen Taylor 2005-01-19 21:28:18 UTC
*** Bug 164630 has been marked as a duplicate of this bug. ***
Comment 11 Owen Taylor 2005-07-25 13:05:51 UTC
*** Bug 311473 has been marked as a duplicate of this bug. ***
Comment 12 Behdad Esfahbod 2006-08-22 23:43:38 UTC
I actually went on and committed a patch similar to the first patch in this bug, so space characters go through the same path as graphic characters now.  That works better in the meantime, until we implement what this bug is about.  This is in Pango 1.14.2.
Comment 13 Owen Taylor 2006-08-23 03:21:03 UTC
I take it that the point of this change is to support fonts that have random
junk at space codepoints for MathML fonts?

It doesn't seem to me that you want fallback in that case ... if the exactly
specified font has something for THREE-PER-EM space, sure using that glyph
is probably fine.

But searching all fonts on the system for something that happens to have 
a glyph for THREE-PER-EM SPACE seems very wrong to me.
Comment 14 Behdad Esfahbod 2006-08-23 03:48:47 UTC
(In reply to comment #13)
> I take it that the point of this change is to support fonts that have random
> junk at space codepoints for MathML fonts?

No no.  Not at all.  It's just that the MathML is using the cmex font, and that font doesn't have a glyph for U+2009 THIN SPACE, but the tests were using that for something like $\int f(x)\thinspace dx$.  Without this change, the space was assigned to cmex and showing up as a hexbox.  With this change, it's being picked up from DejaVu LGC Sans which is a pretty good font as far as coverage goes.  And it really looks good.  If you have fonts that have very different metrics, you are screwed anyway.

> It doesn't seem to me that you want fallback in that case ... if the exactly
> specified font has something for THREE-PER-EM space, sure using that glyph
> is probably fine.
> 
> But searching all fonts on the system for something that happens to have 
> a glyph for THREE-PER-EM SPACE seems very wrong to me.

Having a good default font like DejaVu is enough to guarantee an improved experience with the patch.  That's why I thought it's better to have it rather than not.

If you still prefer to not have something like this in Pango, I'll go on and implement the correct behavior, but that requires changes in so many places in individual backends and possibly needs an allocation scheme in the glyph space (say 0x02000001 is the synthesized THIN SPACE, 0x02000002 is ...) that I'm not quite ready to do just yet.
Comment 15 Owen Taylor 2006-08-23 13:09:14 UTC
OK, sounds plausible. As far as I can remember, my concerns with using
the standard shaping for spaces were:

 - misc-fixed or similar bitmap fonts were the most comprehensive fonts
   on many systems at that point. Using a non-scalable bitmap for a space
   doesn't work well. (Every scalable font has the same with for THREE-FOR-EM, but
   not bitmap fonts.)

 - Excess item breaks and shaper changes cause considerable inefficiency.

I don't think you'd actually need glyph space allocation though. 
Either:

* Use PANGO_GLYPH_EMPTY, but then all shapers need to be changed to 
  call some sort of shape_space() function.

* Just piggyback off of the missing glyph infrastructure. It even makes
  some sense ... drawing a hex box for THIN SPACE is just silly ... we know
  how to draw (or well, not draw) something better than that for THIN SPACE.
Comment 16 sangu 2006-08-23 13:23:37 UTC
*** Bug 349428 has been marked as a duplicate of this bug. ***
Comment 17 Behdad Esfahbod 2006-08-23 16:40:42 UTC
(In reply to comment #15)

> * Just piggyback off of the missing glyph infrastructure. It even makes
>   some sense ... drawing a hex box for THIN SPACE is just silly ... we know
>   how to draw (or well, not draw) something better than that for THIN SPACE.

Yeah, this makes sense.  We just need to change the backend glyph_extents and render functions that switches on the UNKNOWN glyphs.

Makes perfect sense after I move the glyph_extents from pangocairo-fcfont.c into pangocairo-font.c
Comment 18 Behdad Esfahbod 2007-03-12 22:13:50 UTC
*** Bug 416526 has been marked as a duplicate of this bug. ***
Comment 19 Brian Cameron 2007-06-26 08:22:10 UTC
Any update on this?
Comment 20 Behdad Esfahbod 2007-06-28 14:46:07 UTC
This is mostly doe these days in that we search for a font that covers the space character now (except for the ASCII space).

The remaining issue is the same as in bug 63633.

*** This bug has been marked as a duplicate of 63633 ***
Comment 21 Takao Fujiwara 2007-07-12 11:53:32 UTC
I confirmed the original bug is fixed in the latest pango.
I'll remove the Sun patch pango-01-fullwidth-space.diff.
Comment 22 Morten Welinder 2009-10-02 17:45:46 UTC
I'm not convinced this is completely fixed.  In particular, I am still
seeing spaces that are too tall:

font=Sans 10
U200a --> 1024x17408
U2009 --> 3072x17408
U2008 --> 4096x17408
U0020 --> 4096x16384
U2006 --> 2048x17408

I don't think the unusual ones should be taller than U2000.
Comment 23 Behdad Esfahbod 2009-10-02 20:03:09 UTC
If your main font doesn't cover them, we search fallback fonts.  That's all.  Please file a new bug.  The original bug is fixed.
Comment 24 Morten Welinder 2009-10-02 20:05:31 UTC
Behdad: I did -- bug 416526 -- but you duped it to this one.
Comment 25 Behdad Esfahbod 2009-10-02 20:12:15 UTC
Ok, reopened that one.