After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 159291 - utf8 output for html
utf8 output for html
Status: RESOLVED FIXED
Product: doxygen
Classification: Other
Component: general
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: Dimitri van Heesch
Dimitri van Heesch
Depends on:
Blocks:
 
 
Reported: 2004-11-24 11:23 UTC by Lukasz Michalski
Modified: 2012-11-18 11:12 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
c code to pass mixed iso-8859-1 utf-8 to iso-8859-1 (6.06 KB, text/x-csrc)
2006-05-06 11:49 UTC, Guillermo Ballester Valor
Details
c code to pass mixed iso-8859-1 utf-8 to utf-8 (6.99 KB, text/x-csrc)
2006-05-06 11:51 UTC, Guillermo Ballester Valor
Details

Description Lukasz Michalski 2004-11-24 11:23:54 UTC
I need to generate i18n documentation for C++ project using doxygen. In non 
c++ files I am using UTf-8 encoding. Documentation generated by doxygen looks 
good, but I have to set encoding to utf8 in my web browser manually. 
 
There should be option to set output encoding for HTML generated documentation
Comment 1 Christopher Mann 2005-02-11 08:14:55 UTC
The lack of this feature is a blocker for my project because :
 (1) The project files are encoded in UTF-8
 (2) The comments are in french with accents

I confirm the validity and suggest that it be considered as a bug/enhancement
and not just an enhancement.
Comment 2 Matthias Kaschub 2006-01-20 17:42:21 UTC
I had the same problem: 
Sourcecode in utf-8 encoding, actually the whole system is utf-8 (as default on Ubuntu, FC and many other distros)

I fixed it for me by doing the following in Debian and Ubuntu:
  apt-get source doxygen
  cd doxygen-1.4.6/
  perl -i -p -e 's/iso-8859-1/utf-8/g' src/translator_de.h
  recode iso-8859-1..utf-8 src/translator_de.h
  debuild 

You could do analogous to this for all the other languages.

But unfortunately this is not a general solution because it will break any project using anything that is not just 7bit and not utf-8.

I didn't look into the charset-translation in translator.cpp whether it would be possible to also do translation from/to multibyte encodings there. Perhaps someone can tell this off-hand?
Comment 3 Yan Morin 2006-03-26 22:46:49 UTC
I have the same problem, my editor save always as UTF-8, my system is configurated (FC4) in UTF-8, my html server (apache) is configurated for utf-8 serving, etc.
Comment 4 unkown 2006-04-03 09:53:30 UTC
Same problem with c++ files witch contains UTF-8 char's.
Comment 5 Guillermo Ballester Valor 2006-05-06 11:49:41 UTC
Created attachment 64918 [details]
c code to pass mixed iso-8859-1 utf-8 to iso-8859-1
Comment 6 Guillermo Ballester Valor 2006-05-06 11:51:00 UTC
Created attachment 64919 [details]
c code to pass mixed iso-8859-1 utf-8 to utf-8
Comment 7 Guillermo Ballester Valor 2006-05-06 11:54:51 UTC
Hello, 
I got the same problem documenting a Spanish project. My solution was to write a short C program to convert all utf-8 code to iso-8859-1. Is fast and works nice for me. I've attached the code of 'mix2latin1'.

In essence, we let doxygen made his work. Then we 'clean' all utf-8 stuff with this easy bash script

#!/bin/sh
#
# Usage:
# fix2latin1 /path/to/dir/with/doxygen/generated/code 
#

cd ${1}
for fich in *.html ; do
   echo ${fich}
   cat ${fich} | mix2latin1 > ${fich}.tmp
   mv ${fich}.tmp ${fich}
done

The option B was to convert iso-8859-1 generated to utf-8, but then we also should need to change the html charset header. I also wrote the utility code 'mix2utf8' (attached)  
Comment 8 Dimitri van Heesch 2007-02-25 13:32:55 UTC
Since the latest CVS update Doxygen uses UTF-8 internally for all 
strings and uses iconv to recode the input to UTF-8. For HTML, LaTeX 
and man pages the output is now always UTF-8. For RTF the encoding is 
local and depends on the code page specified in the translator.
The config option USE_WINDOWS_ENCODING has been removed.
A new condig option INPUT_ENCODING has been added which can be used to
specify the encoding of the input. Another config option DOXYFILE_ENCODING
can be used to specify the encoding of the config file itself.

Could you please let me know if this fixes the problem.
Comment 9 Dimitri van Heesch 2007-04-05 20:10:03 UTC
This bug was marked "assigned" by me some time ago, which means it should be
fixed in version 1.5.2 and is hereby marked as such. I would kindly request you
to check if this version indeed fixes the problem and reopen the bug report
should you still see the same problem.