Bug 705220 – Enables using unicode (non ASCII) to name output files

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 705220 - Enables using unicode (non ASCII) to name output files


Summary:	Enables using unicode (non ASCII) to name output files


Status:	RESOLVED FIXED

Product:	doxygen
Classification:	Other
Component:	general
Version:	1.8.6-GIT
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Dimitri van Heesch
QA Contact:	Dimitri van Heesch

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-08-01 00:49 UTC by Suzumizaki-Kimitaka
Modified:	2014-04-21 10:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
fix to enable to make the file named with non ASCII(Unicode) characters (6.19 KB, patch) 2013-08-01 00:49 UTC, Suzumizaki-Kimitaka	none	Details \| Review
Updated patch for 1.8.6 release (2.64 KB, patch) 2013-12-27 06:00 UTC, Suzumizaki-Kimitaka	none	Details \| Review

Description Suzumizaki-Kimitaka 2013-08-01 00:49:15 UTC

Created attachment 250573 [details] [review]
fix to enable to make the file named with non ASCII(Unicode) characters

As you know, we can use unicode for filenames
even the files over the internet. (For example, non English versions of Wikipedia)

With the patch here, the doxygen is enabled to use unicode (not only ASCII) characters directly for filenames, instead of using _xHH_xHH_xHH format.

I added LIMIT_FNAME_WITH_ASCII option and we can set it NO(0) to enable this feature (Ofcourse default is YES due to compatibility).

Note: For Windows, Another fix is also required. See Bug 705217.

Regards,
Suzumizaki-Kimitaka

Comment 1 Suzumizaki-Kimitaka 2013-12-27 06:00:33 UTC

Created attachment 264920 [details] [review]
Updated patch for 1.8.6 release

Here is updated patch to catch up with 1.8.6 release.

Regards,
Suzumizaki-Kimitaka

Comment 2 Dimitri van Heesch 2013-12-30 20:27:37 UTC

Hi Suzumizaki,

Thanks for the update.

I've reworked your patch a bit (result should be the same).
I've named the option 'ALLOW_UNICODE_NAMES' and used the code below.

With no 'continue' and 'break' statements and a sightly different way to count the bytes (based on http://www.opensource.apple.com/source/tidy/tidy-2.2/tidy/src/utf8.c):


char ids[5];
const unsigned char uc = (unsigned char)c;
bool doEscape = TRUE; 
if (allowUnicodeNames && uc <= 0xf7)
{
  const char* pt = p;
  ids[ 0 ] = c;
  int l = 0;
  if ((uc&0xE0)==0xC0) 
  {
    l=2; // 11xx.xxxx: >=2 byte character
  }
  if ((uc&0xF0)==0xE0)
  {
    l=3; // 111x.xxxx: >=3 byte character
  }
  if ((uc&0xF8)==0xF0)
  {
    l=4; // 1111.xxxx: >=4 byte character
  }
  doEscape = l==0;    
  for (int m=1; m<l && !doEscape; ++m)
  {
    unsigned char ct = (unsigned char)*pt;
    if (ct==0 || (ct&0xC0)!=0x80) // invalid unicode character
    {
      doEscape=TRUE;
    }
    else
    {
      ids[ m ] = *pt++;
    }
  }
  if ( !doEscape ) // got a valid unicode character
  {
    ids[ l ] = 0;
    growBuf.addStr( ids );
    p += l - 1;
  }
}
if (doEscape) // not a valid unicode char or escaping needed
{
  static char map[] = "0123456789ABCDEF";
  unsigned char id = (unsigned char)c;
  ids[0]='_';
  ids[1]='x';
  ids[2]=map[id>>4];
  ids[3]=map[id&0xF];
  ids[4]=0;
  growBuf.addStr(ids);
}

Let me know if this work for you too.

Comment 3 Suzumizaki-Kimitaka 2014-01-03 04:18:33 UTC

Hi Dimitri, happy new year!

The code on the git seems to work correctly. The difference is how working against invalid utf-8, especially the C0 or C1 header byte. But that's no problem for me.

Thank you!
Suzumizaki-Kimitaka

Comment 4 Dimitri van Heesch 2014-04-21 10:09:36 UTC

This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.7. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information 
that you think can be relevant (preferrably in the form of a self-contained example).