After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 705220 - Enables using unicode (non ASCII) to name output files
Enables using unicode (non ASCII) to name output files
Status: RESOLVED FIXED
Product: doxygen
Classification: Other
Component: general
1.8.6-GIT
Other All
: Normal enhancement
: ---
Assigned To: Dimitri van Heesch
Dimitri van Heesch
Depends on:
Blocks:
 
 
Reported: 2013-08-01 00:49 UTC by Suzumizaki-Kimitaka
Modified: 2014-04-21 10:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
fix to enable to make the file named with non ASCII(Unicode) characters (6.19 KB, patch)
2013-08-01 00:49 UTC, Suzumizaki-Kimitaka
none Details | Review
Updated patch for 1.8.6 release (2.64 KB, patch)
2013-12-27 06:00 UTC, Suzumizaki-Kimitaka
none Details | Review

Description Suzumizaki-Kimitaka 2013-08-01 00:49:15 UTC
Created attachment 250573 [details] [review]
fix to enable to make the file named with non ASCII(Unicode) characters

As you know, we can use unicode for filenames
even the files over the internet. (For example, non English versions of Wikipedia)

With the patch here, the doxygen is enabled to use unicode (not only ASCII) characters directly for filenames, instead of using _xHH_xHH_xHH format.

I added LIMIT_FNAME_WITH_ASCII option and we can set it NO(0) to enable this feature (Ofcourse default is YES due to compatibility).

Note: For Windows, Another fix is also required. See Bug 705217.

Regards,
Suzumizaki-Kimitaka
Comment 1 Suzumizaki-Kimitaka 2013-12-27 06:00:33 UTC
Created attachment 264920 [details] [review]
Updated patch for 1.8.6 release

Here is updated patch to catch up with 1.8.6 release.

Regards,
Suzumizaki-Kimitaka
Comment 2 Dimitri van Heesch 2013-12-30 20:27:37 UTC
Hi Suzumizaki,

Thanks for the update.

I've reworked your patch a bit (result should be the same).
I've named the option 'ALLOW_UNICODE_NAMES' and used the code below.

With no 'continue' and 'break' statements and a sightly different way to count the bytes (based on http://www.opensource.apple.com/source/tidy/tidy-2.2/tidy/src/utf8.c):


char ids[5];
const unsigned char uc = (unsigned char)c;
bool doEscape = TRUE; 
if (allowUnicodeNames && uc <= 0xf7)
{
  const char* pt = p;
  ids[ 0 ] = c;
  int l = 0;
  if ((uc&0xE0)==0xC0) 
  {
    l=2; // 11xx.xxxx: >=2 byte character
  }
  if ((uc&0xF0)==0xE0)
  {
    l=3; // 111x.xxxx: >=3 byte character
  }
  if ((uc&0xF8)==0xF0)
  {
    l=4; // 1111.xxxx: >=4 byte character
  }
  doEscape = l==0;    
  for (int m=1; m<l && !doEscape; ++m)
  {
    unsigned char ct = (unsigned char)*pt;
    if (ct==0 || (ct&0xC0)!=0x80) // invalid unicode character
    {
      doEscape=TRUE;
    }
    else
    {
      ids[ m ] = *pt++;
    }
  }
  if ( !doEscape ) // got a valid unicode character
  {
    ids[ l ] = 0;
    growBuf.addStr( ids );
    p += l - 1;
  }
}
if (doEscape) // not a valid unicode char or escaping needed
{
  static char map[] = "0123456789ABCDEF";
  unsigned char id = (unsigned char)c;
  ids[0]='_';
  ids[1]='x';
  ids[2]=map[id>>4];
  ids[3]=map[id&0xF];
  ids[4]=0;
  growBuf.addStr(ids);
}

Let me know if this work for you too.
Comment 3 Suzumizaki-Kimitaka 2014-01-03 04:18:33 UTC
Hi Dimitri, happy new year!

The code on the git seems to work correctly. The difference is how working against invalid utf-8, especially the C0 or C1 header byte. But that's no problem for me.

Thank you!
Suzumizaki-Kimitaka
Comment 4 Dimitri van Heesch 2014-04-21 10:09:36 UTC
This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.7. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information 
that you think can be relevant (preferrably in the form of a self-contained example).