After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 686259 - index.qhp malformed when using special characters in \addindex
index.qhp malformed when using special characters in \addindex
Status: RESOLVED FIXED
Product: doxygen
Classification: Other
Component: general
1.8.2-SVN
Other Windows
: Normal major
: ---
Assigned To: Dimitri van Heesch
Dimitri van Heesch
Depends on:
Blocks:
 
 
Reported: 2012-10-17 02:14 UTC by Bastiaan
Modified: 2013-05-19 12:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (1.42 KB, patch)
2012-10-17 02:14 UTC, Bastiaan
none Details | Review
Example (21.59 KB, application/zip)
2012-10-29 08:43 UTC, Bastiaan
  Details

Description Bastiaan 2012-10-17 02:14:47 UTC
Created attachment 226600 [details] [review]
patch

In an attempt to test the improvements in doxygenw20120930_1_8_2.zip I encountered the following:

\addindex coördinaten

produces this item in index.qhp:

<keyword name="coördinaten" id="coördinaten" ref="newlay.html#coördinaten" />

As a consequence, qhelpgenerator fails with the message "Encountered incorrectly encoded content."


I am attaching a patch that %-encodes the anchor fragment and handles unicode correctly. It is modelled after QUrl of Qt 4.7. It does not touch the name and id arguments, but it solves the above mentioned problem.
Comment 1 Dimitri van Heesch 2012-10-28 11:31:17 UTC
Hi Bastiaan,

I'm not sure your solution would work for general UTF-8 characters (i.e. multi-byte characters).

Did you configure your input encoding correctly (see INPUT_ENCODING)? Then I would expect that the anchor would be UTF-8 encoded as well.

Can you attach a self-contained example (source+config file in a zip) that allows me to reproduce this issue?
Comment 2 Bastiaan 2012-10-29 08:43:05 UTC
Created attachment 227507 [details]
Example

Hi Dimitri,

Here is the example. Yes, INPUT_ENCODING is left at UTF-8. As far as I can see, ö is a multi-byte character, as it gives two characters in the name and id fields of the keyword (coördinaten) when index.qhp is viewed in ISO 8859-15 encoding (Kate). Still the anchor displays as "#coördinaten". So it looks like the name and id are encoded but not the anchor. Consequently, it cannot be opened in UTF-8.

Apart from that, the individual bytes of the multi-byte character need to be percent-encoded in the URI, looking like "co%C3%B6rdinaten". Many browsers will decode that URI and display the encoded characters normally, as "#coördinaten" in this case. The patch is a generic solution: also works for spaces, quotes etc.

With the patch index.qhp can be opened in UTF-8, then the name and id show as "coördinaten" and the anchor is percent-encoded; and qhelpgenerator is happy.

Hope this helps,
Bastiaan.
Comment 3 Dimitri van Heesch 2013-05-11 19:49:26 UTC
Thanks, I'll include the patch in the next subversion update.
Comment 4 Dimitri van Heesch 2013-05-19 12:35:32 UTC
This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.4. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information
that you think can be relevant.