GNOME Bugzilla – Bug 159291
utf8 output for html
Last modified: 2012-11-18 11:12:50 UTC
I need to generate i18n documentation for C++ project using doxygen. In non c++ files I am using UTf-8 encoding. Documentation generated by doxygen looks good, but I have to set encoding to utf8 in my web browser manually. There should be option to set output encoding for HTML generated documentation
The lack of this feature is a blocker for my project because : (1) The project files are encoded in UTF-8 (2) The comments are in french with accents I confirm the validity and suggest that it be considered as a bug/enhancement and not just an enhancement.
I had the same problem: Sourcecode in utf-8 encoding, actually the whole system is utf-8 (as default on Ubuntu, FC and many other distros) I fixed it for me by doing the following in Debian and Ubuntu: apt-get source doxygen cd doxygen-1.4.6/ perl -i -p -e 's/iso-8859-1/utf-8/g' src/translator_de.h recode iso-8859-1..utf-8 src/translator_de.h debuild You could do analogous to this for all the other languages. But unfortunately this is not a general solution because it will break any project using anything that is not just 7bit and not utf-8. I didn't look into the charset-translation in translator.cpp whether it would be possible to also do translation from/to multibyte encodings there. Perhaps someone can tell this off-hand?
I have the same problem, my editor save always as UTF-8, my system is configurated (FC4) in UTF-8, my html server (apache) is configurated for utf-8 serving, etc.
Same problem with c++ files witch contains UTF-8 char's.
Created attachment 64918 [details] c code to pass mixed iso-8859-1 utf-8 to iso-8859-1
Created attachment 64919 [details] c code to pass mixed iso-8859-1 utf-8 to utf-8
Hello, I got the same problem documenting a Spanish project. My solution was to write a short C program to convert all utf-8 code to iso-8859-1. Is fast and works nice for me. I've attached the code of 'mix2latin1'. In essence, we let doxygen made his work. Then we 'clean' all utf-8 stuff with this easy bash script #!/bin/sh # # Usage: # fix2latin1 /path/to/dir/with/doxygen/generated/code # cd ${1} for fich in *.html ; do echo ${fich} cat ${fich} | mix2latin1 > ${fich}.tmp mv ${fich}.tmp ${fich} done The option B was to convert iso-8859-1 generated to utf-8, but then we also should need to change the html charset header. I also wrote the utility code 'mix2utf8' (attached)
Since the latest CVS update Doxygen uses UTF-8 internally for all strings and uses iconv to recode the input to UTF-8. For HTML, LaTeX and man pages the output is now always UTF-8. For RTF the encoding is local and depends on the code page specified in the translator. The config option USE_WINDOWS_ENCODING has been removed. A new condig option INPUT_ENCODING has been added which can be used to specify the encoding of the input. Another config option DOXYFILE_ENCODING can be used to specify the encoding of the config file itself. Could you please let me know if this fixes the problem.
This bug was marked "assigned" by me some time ago, which means it should be fixed in version 1.5.2 and is hereby marked as such. I would kindly request you to check if this version indeed fixes the problem and reopen the bug report should you still see the same problem.