GNOME Bugzilla – Bug 570592
Cannot open or save files or paths with localized characters
Last modified: 2010-01-24 19:14:29 UTC
Please fix Dia so that it can open and save files that have any characters that the files system accepts in their filename or path. For example, at the moment Dia can't open or save files like these: - C:\Töllö\my-file.dia - C:\My Files\määrittely.dia It's really annoying.. I already wrote a very extensive bug report about this, but the Bugzilla bugged and lost everything I wrote, so my motivation dropped - hence this brief bug report.
There is a bug in Dia 0.96 with writing to "partial" write-protected directories, i.e. everything below "My Documents" - see bug #504469. (the error message I get is "Not allowed to wriote temporary files ...". I'm not aware - and have just tested again with Dia-0.96.1-9 of problems with localized filenames. Of course you mileage may vary. But than you need to find out what's different on your system.
I'm sure this issue was due to localized characters since saving didn't work with a folder named "Projektikäsikirja", but when I changed the folder name to "Projektikasikirja" everything worked fine. I'm using Dia-0.96.1-8. I'll upgrade sometime next week and try again.
I just installed the latest Dia-0.96.1-9 and tested it: 1. Opened the file "toimintakaavio.dia" => works 2. Saved the file with name "toimintakaavö.dia" => works 3. Closed Dia and reopened the file "toimintakaavö.dia" => Error: unknown filetype (translated from Finnish) So the issue still presists partially. I'd be very happy if you can fix this in time for the release of 0.97!
This may as well be fixed already, e.g. by: 2007-03-17 Hans Breuer <hans@breuer.org> * app/app_procs.c app/autosave.c app/commands.c app/diaconv.c app/export_png.c app/filedlg.c app/load_save.c app/paginate_psprint.c app/preferences.c app/render_eps.c app/sheets_dialog.c app/sheets_dialog_callbacks.c lib/dia_dirs.c lib/dia_xml.c lib/diagdkrenderer.c plug-ins/cgm/cgm.c plug-ins/dxf/dxf-export.c plug-ins/dxf/dxf-import.c plug-ins/hpgl/hpgl.c plug-ins/metapost/render_metapost.c plug-ins/pgf/render_pgf.c plug-ins/pstricks/render_pstricks.c plug-ins/python/pydia-render.c plug-ins/shape/shape-export.c plug-ins/svg/render_svg.c plug-ins/vdx/vdx-export.c plug-ins/vdx/vdx-import.c plug-ins/wmf/wmf.cpp plug-ins/wpg/wpg.c plug-ins/xfig/xfig-export.c plug-ins/xfig/xfig-import.c plug-ins/xslt/xslt.c : use <glib/gstdio.h> to match GLib's filename encoding to the io functions used, that is: g_open, g_fopen, g_stat, g_unlink, g_mkdir, g_rename (, g_access, g_lstat, g_remove, g_freopen, g_chdir, g_rmdir). Also replace gzopen() with gzdopen(g_open(), ...) to properly handle unicode filenames; finally use g_mkstemp(). Fixes bug #131210 and bug #397159. To make this fully work on win32 a recent enough version of libxml2 is required - tested with 2.6.27 - but anything from 2.6.24 should do.
Strangely enough I can reproduce this with 0.96.1-9 from dia-installer and I dont know ahwt going on here: D:\graph\Dia-0.96.1-9\bin>dia --verbose (dia.exe:2524): Gtk-WARNING **: GtkSpinButton: setting an adjustment with non-ze ro page size is deprecated Redirecting output to win32trace remote collector file:///C:/Dokumente%20und%20Einstellungen/hb/zu%20bl%C3%B6d.dia:1: parser error : Start tag expected, '<' not found [Invalid UTF-8] \x1f\x8b\x08 ^ My first assumption of a slightly broken libxml used in the setup does not hold (I checked with the same version which works for my build) but still I get the above message. BUT: when uncompressing the file to something still including the localized characters it can be loaded! Steffen any idea what's wrong with your build?
Unfortunately, not really. I've recently added the Windows Platform SDK and noted that it adds things to the PATH. I'll try to rebuild with these things removed. I tried with the latest libxml, zlib and iconv DLLs from Igor Zlatkovic, but the problem is still there.
Hans, using your working set of binaries: Does the problem originate from dia-app.dll or libdia.dll?
Rebuild without any Windows Platform SDK locations in the path didn't help.
Not sure why it ever worked for me before, after switching to libxml 2.7.3 (build from svn) I see the same problem. Reported as libxml2 bug #574393 - not sure yet if there needs to be a workaround in Dia.
*** Bug 576737 has been marked as a duplicate of this bug. ***
*** Bug 552463 has been marked as a duplicate of this bug. ***
Hans, what do you think about the following workaround (win32 only): * If the file is compressed, gunzip it into a memory buffer and use xmlParseMemory() instead of xmlParseFile() Did I miss something?
I don't think adding a work-around is leading us anywhere, and that one would certainly be much too huge. Why should everyone who does *not* use the broken combination suffer from the defect in libxml? The right thing would of course be to write a patch for libxml. But did you notice that I already gave an answer? http://mail.gnome.org/archives/dia-list/2009-April/msg00062.html At 17.04.2009 22:36, Hans Breuer wrote: >> http://bugzilla.gnome.org/show_bug.cgi?id=570592 >> > It certainly would be nice to have this fixed, but I will not make the > dia-0.97 release depend on it. Apparently our definitions of "showstopper" > are very different. > For me a showstopper does not have a simple workaround. This issue has two: > don't use localized filenames (or directories) and commpressed diagrams > together. [...] > The work-around I'm pondering could be to convert from utf-8 filenames to > locale filename before talking to libxml2, but that still would not work if > the choosen filename is not convertable into the locale encoding, e.g. > saving with a japanese filename on a german windows. (I'm uncertain if the > work-around would work at all on a japanese windows version.) >
*** Bug 561234 has been marked as a duplicate of this bug. ***
*** Bug 584115 has been marked as a duplicate of this bug. ***
libxml2 2.7.4 does include the fix for this issue (bug #574393)
I just tested this with the latest Dia 0.97 and the error still exists. This has _not_ been fixed yet. The behavior is exactly the same as I described for version 0.96-9 in my post from 2009-02-11 10:04:41 UTC. Opening files with locale characters is impossible!
The current Dia 0.97 release does not yet contain libxml2 2.7.4. I'm still waiting for the "official" win32 binaries of the new libxml2 versions. Note that the problem does not occur if you do not use file compression (uncompressing the files also solves the problem).
As there's still no "official" libxml2 win32 binaries with the fix available, I've compiled the library myself: http://sourceforge.net/projects/dia-installer/files/libxml2/2.7.6/libxml2-2.7.6-bin.zip/download Just replace the file dia/bin/libxml2.dll of your Dia installation with the one from the zip file. Feedback is welcome. Note that the new DLL seems to cause a problem with diashapes.exe (unable to download the sheets.xml file). Sorry that this is taking so long.
I just tried with the new libxml2.dll from http://sourceforge.net/projects/dia-installer/files/libxml2/2.7.6/libxml2-2.7.6-bin.zip/download Saving files with localized characters now works! However opening does still not.
If you are opening the file via explorer, please see bug #591302. Otherwise I don't have any idea - and it is contradictory to other people's tests: http://mail.gnome.org/archives/dia-list/2009-November/msg00000.html
Could you describe as detailed as possible how you try to open the file? If you're familiar with the commandline, could you try to open the program with dia.exe instead of diaw.exe as this might provide additional error messages. Are there any error messages? What is the exact path/filename you're using? Is this some kind of special drive? Which OS do you use exactly?
The steps to reproduce the error is the same as I've described in this bug post before. I'll repeat with a bit more detail: 1. Open Dia 2. Make graph 3. Select from menu File > Save and make filename to "etäkäyttö.dia" and choose "My Documents" as folder. => This works now with the new libxml2.dll! 4. Close Dia. 5. Open "My Documents" with the file browser, double click on the file with localized characters that you made in step 3. => Nothing happends. I suspect Dia launches in the background but is unable to open the file. 6. Open Dia. 7. Select from menu File > Open, browse to "My Documents" 8. Select the file you made in step 3. => Dia compalins that the filetype is unknown and file opening fails. My OS is Windows XP.
Created attachment 148530 [details] Error message when opening a file having special characters in the filename
Created attachment 148531 [details] Error message when opening a file having special characters in the filename
(In reply to comment #22) > Could you describe as detailed as possible how you try to open the file? > > If you're familiar with the commandline, could you try to open the program > with dia.exe instead of diaw.exe as this might provide additional error > messages. > > Are there any error messages? First I want confirm this bug for both a French Windows XP and a German Windows 2000 with the current stable version (0.97). And I can reproduce it on different machines. As requested I started dia.exe from command line. See attachments for output. > What is the exact path/filename you're using? Is this some kind of special > drive? The two attached examples were created in virtual machines, but the behaviour is the same on standard hardware (DELL PC, IDE/SATA harddisk). I had admin permissions when testing. > Which OS do you use exactly? Windows XP Professionel SP3 (5.1, Build 2600) French Windows 2000 Professional SP4 (5.0, Build 2195) German Both are localized versions (not international + language pack) and both are fully patched. HTH
Created attachment 149117 [details] [review] Use win32 wide character API to support localized file names I've tested this patch successfully with * German Windows XP * cmd.exe, German and Arabic filename (not displayed correctly in cmd.exe) * explorer.exe, German and Arabic filename (displayed correctly) * Windows 7 * explorer.exe * Powershell.exe What is still missing: Update dia-win-remote.exe accordingly
Have to test on Linux...
The patch does not look quite right to me, e.g. I think it will break with most of the other commandline options. I've just commited a different approach to trunk extending the use of GOption (ideas inspired form The GIMP's sources). The filename given on the commandline needs to be representable in the locale encoding, so this wont work for Japanese filenames on German windows. For these cases the GUI needs to used, which does not share the commandline restrictions. If some testing does not reveal regressions this could be merged to dia-0-97.
You're right, I focused just on the filenames. But I think that using the win32 wide character API is a must, because the ordinary user will take explorer.exe as the reference and not cmd.exe. After installation of the east-asian fonts, my German XP displays Japanese filenames nicely in explorer.exe. And handling/opening of these files worked with my patch, even when the fonts aren't installed. And I think that we'll not be able to avoid the #ifdefs, because this is really Windows specific - e.g. Linux shells use UTF-8 by default. The perception of the ordinary user that Dia is unable to open the saved file is a serious issue - we should definitely avoid this.
There must be a way to make it work with explorer and cmd. But probably the win32 wchar version can not use GOption at all, because there every string is converted by g_locale_to_utf8() (see: glib/goption.c). Maybe app_init() should be split further to make the necessary parts available for some reimplmentation of WinMain - with it's own wchar command line parsing and conversion to utf8. Thus dia.exe could work as is, but not support filenames with non-locale encoding. And diaw.exe would support the limited subset necessary for explorer 'integration': filenames and some fileformat conversions (the context menu created by the installer). Although I'm not considering the non-locale filenames important, I will review possible patches.
I'm giving the whole thing another try: * diaw.exe recoding using wide character commandline options and recoding everything to UTF-8 I'm not sure though, what's the best way to pass UTF-8 filenames to app_init() - do you have an idea?
There already is an utf-8 list of strings in app_init() - you could make it an extra, optional(default: NULL) parameter of app_init. But still this sounds like asking for trouble, at least the filenames in that list need to be removed from argc/argv - I still like the splitting idea from above better...
Created attachment 150573 [details] [review] Allow to pass wide character filenames to diaw.exe on the commandline What do you think about the attached patch?
BTW: a better way solving this would be an accepted patch for bug #522131. I only looked at your patch, did not test it myself: * the current combination of WinMain() files and still passing the full argv into app_init() looks like every file found by the former will be opened twice by the latter (except the files which can not be represented in the local encoding, but they would spit a warning still) * every string conversion which is not a valid filename leaks 'utf8' * I'm uncertain if checking for existance is the right way to perform list adding, I think we need at least some command line parsing, i.e.: - ignore everything starting with a dash - remove converted filenames from argv and argc, but only if they are not a parameter of -e (is that used by your shell integration?)
Currently the shell integration is only using -t, but still -e should be ignored. I'll rework the patch to overcome the above problems. Do you know if it's "evil" to manipulate __argv and __argc directly? Should I make a copy instead?
Created attachment 150611 [details] [review] Additional patch to address the problems identified by Hans * Fixed utf8 leak * Ignore arguments starting with "-" * Ignore files passed after "-e" * Remove "verified" files from args passed to app_init() If you would prefer a single patch, let me know. This is how it came out of git.
Created attachment 151608 [details] [review] Allow to pass wide character filenames to dia-win-remote.exe on the commandline Additional patch that adds wide character support to dia-win-remote.exe. From my point of view, this solves the localized character problem in Dia. Of course, a fix for #522131 is the cleaner solution. I'll rework the code once glib is updated in this respect.
The new patch (for app_init) looks like it has a problem converting the parameter directly after a conversion. If at all I think it would be easier to steadily fill a new array with non-filenames (better: not converted), rather than modifying __argv in place. But I was pondering a different idea, namely allowing URIs on the standard command line. This should have multiple advantages: - the code in question could be tested by simple shell scripts (of course the percent sign needs proper escaping) - not platform specific and slightly simpler code - should also be possible to extend for the input and output directory switches as well as -e Given that non-locale filenames on the command line would (and can) only be created by dia-win-remote there should be no problem to produce correct URIs and the user would not need to be bothered with them.
Created attachment 151877 [details] [review] Unfinished patch interpreting URIs The current implementation of the patch only deals with filenames. Also it has a problem when the GLib filename encoding is not utf-8.
To fix this bug for the limited Dia installer/dia-win-remote interface, the current URI patch is sufficient. I'm currently working on a dia-win-remote patch that passes URIs. For "problem when the GLib filename encoding is not utf-8": why not simply pass the g_filename_from_uri() result through g_filename_to_utf8() - if that's not the problem, could you explain in more details?
Created attachment 152096 [details] [review] URI-encodes filenames passed from dia-win-remote.exe to diaw.exe
there is a leak passing the result g_filename_from_utf8() directly to g_filename_to_uri(), but the conversion is simply not necessary on win32, so I'll remove it when commiting. Regarding comment #41: the problem I see in my patch is nothing new. We were always converting filename to utf8 and passing the result to g_file_test(). Now if the Glib file encoding is "on-disk file name bytes on Unix" [1] and that is not utf-8 we would need an additional conversion before the g_file_test(). Given that noone seems to have run into this problem I may do the "conversion" as you say - although it is quite pointless for the case where the GLib filename encoding is utf-8 (just a strdup) and wrong for the other case;)
[1] http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html#g-filename-to-uri
Steffen, something is missing with your patch (id=152096), it gives: dia-win-remote.c(276) : warning C4133: 'function' : incompatible types - from 'c har *' to 'unsigned short *' and the glib-2.0.lib should not be hardcoded in makefile.msc. If the DnD case worked for you just adding (LPWSTR) should be enough.
I also got the warning, things worked for me nevertheless. The DnD case works - please use diaw.exe --integrated for the initial start. I'll have a look again at the warning and the hardcoded glib-2.0.lib
Not need to look at makefile.msc. I've already change it to use $(GLIB_LIBS)
Thanks. The strange thing is that if I change line 276 to use LPWSTR instead of LPSTR, dia-win-remote.exe crashes on me with a NULL pointer. Switching back to LPSTR, things just work.
Not strange but C;) By s/LPSTR/LPWSTR/ the resulting pointer get twice as much bytes incremented. I was just adding an extra cast to avoid the warning. Pushed to master and soon to dia-0-97 branch.
*** Bug 591302 has been marked as a duplicate of this bug. ***
I've just released dia-0.97.1 - windows version should follow soon;)