GNOME Bugzilla – Bug 705220
Enables using unicode (non ASCII) to name output files
Last modified: 2014-04-21 10:09:36 UTC
Created attachment 250573 [details] [review] fix to enable to make the file named with non ASCII(Unicode) characters As you know, we can use unicode for filenames even the files over the internet. (For example, non English versions of Wikipedia) With the patch here, the doxygen is enabled to use unicode (not only ASCII) characters directly for filenames, instead of using _xHH_xHH_xHH format. I added LIMIT_FNAME_WITH_ASCII option and we can set it NO(0) to enable this feature (Ofcourse default is YES due to compatibility). Note: For Windows, Another fix is also required. See Bug 705217. Regards, Suzumizaki-Kimitaka
Created attachment 264920 [details] [review] Updated patch for 1.8.6 release Here is updated patch to catch up with 1.8.6 release. Regards, Suzumizaki-Kimitaka
Hi Suzumizaki, Thanks for the update. I've reworked your patch a bit (result should be the same). I've named the option 'ALLOW_UNICODE_NAMES' and used the code below. With no 'continue' and 'break' statements and a sightly different way to count the bytes (based on http://www.opensource.apple.com/source/tidy/tidy-2.2/tidy/src/utf8.c): char ids[5]; const unsigned char uc = (unsigned char)c; bool doEscape = TRUE; if (allowUnicodeNames && uc <= 0xf7) { const char* pt = p; ids[ 0 ] = c; int l = 0; if ((uc&0xE0)==0xC0) { l=2; // 11xx.xxxx: >=2 byte character } if ((uc&0xF0)==0xE0) { l=3; // 111x.xxxx: >=3 byte character } if ((uc&0xF8)==0xF0) { l=4; // 1111.xxxx: >=4 byte character } doEscape = l==0; for (int m=1; m<l && !doEscape; ++m) { unsigned char ct = (unsigned char)*pt; if (ct==0 || (ct&0xC0)!=0x80) // invalid unicode character { doEscape=TRUE; } else { ids[ m ] = *pt++; } } if ( !doEscape ) // got a valid unicode character { ids[ l ] = 0; growBuf.addStr( ids ); p += l - 1; } } if (doEscape) // not a valid unicode char or escaping needed { static char map[] = "0123456789ABCDEF"; unsigned char id = (unsigned char)c; ids[0]='_'; ids[1]='x'; ids[2]=map[id>>4]; ids[3]=map[id&0xF]; ids[4]=0; growBuf.addStr(ids); } Let me know if this work for you too.
Hi Dimitri, happy new year! The code on the git seems to work correctly. The difference is how working against invalid utf-8, especially the C0 or C1 header byte. But that's no problem for me. Thank you! Suzumizaki-Kimitaka
This bug was previously marked ASSIGNED, which means it should be fixed in doxygen version 1.8.7. Please verify if this is indeed the case. Reopen the bug if you think it is not fixed and please include any additional information that you think can be relevant (preferrably in the form of a self-contained example).