GNOME Bugzilla – Bug 596819
High-bit ISO-8859-1 characters are not always converted to UTF-8 in the output
Last modified: 2009-12-30 13:38:59 UTC
I have an input file in ISO-859-1 and have set INPUT_ENCODING accordingly. The output is, in general, in UTF-8, as expected. Occasionally, however, an ISO-85-1 character with its high bit set will be copied unchanged to the output. This occurs in HTML (which will just show up as a funny glyph), in LaTeX (which will make LaTeX choke because it expects clean UTF-8) and in Perlmod. The attached file exhibits the problem: input line 33 of test.h, output in section "Macro definitions" of test_8h.html.
Created attachment 144356 [details] Config and input file for reproducing the problem.
The main problem here is that in Latex source code output is reformatted (to prevent page overflows) and the reformatted is not UTF8 aware. As a result it could insert characters in the middle of a multibyte character. I'll correct this.
Actually Comment 2 above is more in line with what you reported in bug #596807. I actually didn't see any invalid characters in test_8h.html anymore (they were there in the official 1.6.1 release though)
OK, I'll re-test when I get the next release. Thnx, Dimitri! (Btw: great response on all accounts! This is a marvellous job. And I mean all of doxygen.)
This bug was previously marked ASSIGNED, which means it should be fixed in doxygen version 1.6.2. Please verify if this is indeed the case and reopen the bug if you think it is not fixed (include any additional information that you think can be relevant).