After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 79071 - yelp can't show properly localized man page.
yelp can't show properly localized man page.
Status: RESOLVED FIXED
Product: yelp
Classification: Applications
Component: XSLT
2.3.x
Other other
: High major
: ---
Assigned To: Mikael Hallendal
Yelp maintainers
Depends on:
Blocks: 83076
 
 
Reported: 2002-04-18 10:21 UTC by Young-Ho Cha
Modified: 2010-04-29 22:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
korean man page example (ls) (4.15 KB, application/octet-stream)
2002-05-22 05:46 UTC, Young-Ho Cha
  Details
Patch with incomplete fix :( (2.46 KB, patch)
2002-05-28 22:48 UTC, Federico Mena Quintero
none Details | Review
man2html patch against libgnome 2.0.5 (2.61 KB, patch)
2002-09-30 14:24 UTC, Young-Ho Cha
none Details | Review

Description Young-Ho Cha 2002-04-18 10:21:13 UTC
It seems like http://bugzilla.gnome.org/show_bug.cgi?id=47548

yelp shows localized man pages with broken characters.

i think most of localized man pages are written with local charset not utf-8
Comment 1 Mikael Hallendal 2002-04-18 11:12:08 UTC
hmm .. ok, I can take a look at this and see if I can solve it. If the
problem is in the gnome2-man2html (which I fear) I might not have a
clue what to do (that code is hairy).

Thanks,
Comment 2 Young-Ho Cha 2002-05-22 04:52:18 UTC
This is releated with gnome2-man2html, so reassigning to libgnome.
Comment 3 Young-Ho Cha 2002-05-22 05:46:02 UTC
Created attachment 8648 [details]
korean man page example (ls)
Comment 4 Young-Ho Cha 2002-05-22 06:39:57 UTC
I tested gnome2-man2html with ls.1.gz (korean manpage sample) like this.

$ zcat /usr/share/man/ko/man1/ls.1.gz |gnome2-man2html  > test.html

and render with galeon.

galeon shows like this , http://tkp.ulsan.ac.kr/~ganadist/broken.png

and i  tested that convert to utf8

$ zcat /usr/share/man/ko/man1/ls.1.gz | iconv -t utf-8 -f euc-kr |
gnome2-man2html > test.html

and render with galeon.

http://tkp.ulsan.ac.kr/~ganadist/broken1.png

it seems gnome2-html handles non-ascii characters convert to esc
character (like ì) .

but html rendering engine treat  one esc character as  only one character
Comment 5 Luis Villa 2002-05-23 14:11:20 UTC
Pretty serious i18n problem, right, sander?
Comment 6 Sander Vesik 2002-05-23 17:14:16 UTC
agreed, this is serious i18n problem. 
Comment 7 Federico Mena Quintero 2002-05-23 19:05:29 UTC
Assigning to myself.
Comment 8 Federico Mena Quintero 2002-05-23 19:11:16 UTC
Possibly fixed on CVS; you need a new libgnome.  Could someone please
test this and reopen the bug if it doesn't work?
Comment 9 Federico Mena Quintero 2002-05-23 19:38:18 UTC
Hmm, I am really not sure if it is fixed.

I changed gnome2-man2html not to escape bytes with the most
significant bit set, e.g. it will output a byte 255 instead of "ÿ".

With this, if I take your man page and do

    zcat ls.1.gz | gnome2-man2html > foo.html

and then view foo.html with Galeon, I can ask Galeon to use EUC-KR
encoding and it displays fine (with Korean glyphs).

However, if I set

    export LANG=ko_KR.eucKR

and then run

    gnome-help man:///home/federico/ls.1.gz

(e.g. your original file) it displays the first Roman characters of
the man page, and stops as soon as it finds the first Korean character.

I'm not sure what's going on.  What I'm pretty sure about is that
gnome2-man2html is not munging characters now, so the bug should not
be in it but rather in yelp.
Comment 10 Young-Ho Cha 2002-05-24 04:04:58 UTC
hmm. gnome2-man2html works properly now. 

but output have no encoding information.

so html rendering engine(libgtkhtml) shows broken characters.

(gecko engine have auto-detect encoding features, so works properly.)

now, we have two solution.

1. put encoding meta tag at gnome2-man2html
2. add auto-detect encoding feature in libgtkhtml
Comment 11 Luis Villa 2002-05-26 16:24:40 UTC
This is very borderline 2.0.0, no? Sander, thoughts?
Comment 12 Sander Vesik 2002-05-26 21:03:51 UTC
i don't think we can realisticly take it as a 2.0.0 bug without
knowingly causing slip with very high probablity or just punting it to
2.0.1 anyways. 

Comment 13 Federico Mena Quintero 2002-05-27 19:02:40 UTC
Is it terribly difficult to include charset detection code in gtkhtml?
Comment 14 Federico Mena Quintero 2002-05-28 22:48:17 UTC
Created attachment 8799 [details] [review]
Patch with incomplete fix :(
Comment 15 Luis Villa 2002-05-28 22:49:42 UTC
Is it even realistic in the 2.0.1 timeframe?
Comment 16 Federico Mena Quintero 2002-05-28 22:52:17 UTC
The patch I just attached has an incomplete fix.  It makes
gnome2-man2html output a META tag with language and charset
information.  However, it appears that what HTML would expect is not
the same thing that you would put in your LANG or LC_MESSAGES variables.

With this patch yelp makes the man page show up as a bunch of nonsense
8-bit characters, rather than proper multibyte Korean characters.

I'm at a loss here.
Comment 17 Mikael Hallendal 2002-05-30 12:54:39 UTC
What about always output utf8 and add this to the header in the
outputed html?

<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">
Comment 18 Federico Mena Quintero 2002-05-31 15:55:47 UTC
The problem is that to convert to UTF-8 I must first know the
character set in which the man page is written.  The program has no
such information, and it would involve charset autodetection code.
Comment 19 Mikael Hallendal 2002-05-31 16:35:33 UTC
*ocuh* that's crap. Is this a weakness of man?
Comment 20 Federico Mena Quintero 2002-05-31 16:39:14 UTC
Yes.  Man pages are an oooold format and they do not contain any
information about what language or charset they are written in.
Comment 21 Luis Villa 2002-06-05 10:20:14 UTC
Reassigning back to the default maintainers; this would be nice to do,
still, but Sun feels that it is not important for them and so federico
has other things to do with his time :) 
Comment 22 Luis Villa 2002-08-15 01:05:26 UTC
Is this even fixable at all? Any point to keeping it open?
Comment 23 Kjartan Maraas 2002-08-15 08:55:44 UTC
Just pointing out that the reporter's mail is bouncing.
Comment 24 Mikael Hallendal 2002-08-15 13:10:49 UTC
I have no idea how to fix this. Perhaps someone with better knowledge
on how to figure out which character set it's written in (is this
possible at all?) have a better clue.
Comment 25 Young-Ho Cha 2002-08-23 04:25:03 UTC
how about user can change charset in preference?
Comment 26 Young-Ho Cha 2002-09-30 14:24:30 UTC
Created attachment 11314 [details] [review]
man2html patch against libgnome 2.0.5
Comment 27 Young-Ho Cha 2002-09-30 14:27:56 UTC
I found Federico's patch missed HTTP-EQUIV="Content-Type".

after apply this patch, yelp shows manpage properly.
Comment 28 Young-Ho Cha 2002-09-30 14:37:19 UTC
see http://ffii.org/archive/mails/groff/2002/Sep/0187.html

there is some tries to put encoding information in manpage.
Comment 29 Luis Villa 2002-10-21 14:46:04 UTC
Ooh. Working patch. yay. Can we get this in, Anders?
Comment 30 Kjartan Maraas 2003-05-01 09:23:33 UTC
I'll commit this if nobody tells me not to in three days
Comment 31 Kjartan Maraas 2003-05-01 11:41:15 UTC
I lied. Commited to both branches.