Bug 596819 – High-bit ISO-8859-1 characters are not always converted to UTF-8 in the output

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 596819 - High-bit ISO-8859-1 characters are not always converted to UTF-8 in the output


Summary:	High-bit ISO-8859-1 characters are not always converted to UTF-8 in the output


Status:	RESOLVED FIXED

Product:	doxygen
Classification:	Other
Component:	general
Version:	1.6.1
Hardware:	Other Windows

Importance:	Normal minor
Target Milestone:	---
Assigned To:	Dimitri van Heesch
QA Contact:	Dimitri van Heesch

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-09-30 04:48 UTC by Gisbert
Modified:	2009-12-30 13:38 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Config and input file for reproducing the problem. (3.30 KB, application/octet-stream) 2009-09-30 04:48 UTC, Gisbert	Details

Description Gisbert 2009-09-30 04:48:00 UTC

I have an input file in ISO-859-1 and have set INPUT_ENCODING accordingly. The output is, in general, in UTF-8, as expected. Occasionally, however, an ISO-85-1 character with its high bit set will be copied unchanged to the output. This occurs in HTML (which will just show up as a funny glyph), in LaTeX (which will make LaTeX choke because it expects clean UTF-8) and in Perlmod.

The attached file exhibits the problem: input line 33 of test.h, output in section "Macro definitions" of test_8h.html.

Comment 1 Gisbert 2009-09-30 04:48:46 UTC

Created attachment 144356 [details]
Config and input file for reproducing the problem.

Comment 2 Dimitri van Heesch 2009-10-03 15:17:52 UTC

The main problem here is that in Latex source code output is reformatted (to prevent page overflows) and the reformatted is not UTF8 aware. As a result it could insert characters in the middle of a multibyte character.
I'll correct this.

Comment 3 Dimitri van Heesch 2009-10-03 15:50:15 UTC

Actually Comment 2 above is more in line with what you reported in bug #596807.
I actually didn't see any invalid characters in test_8h.html anymore (they were there in the official 1.6.1 release though)

Comment 4 Gisbert 2009-10-05 11:59:03 UTC

OK, I'll re-test when I get the next release. Thnx, Dimitri!
(Btw: great response on all accounts! This is a marvellous job. And I mean all of doxygen.)

Comment 5 Dimitri van Heesch 2009-12-30 13:38:59 UTC

This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.6.2. Please verify if this is indeed the case and reopen the
bug if you think it is not fixed (include any additional information that you
think can be relevant).