After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 772411 - Corrupted characters in Greek filenames when saving a pdf report.
Corrupted characters in Greek filenames when saving a pdf report.
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: i18n
2.42.x
Other Windows
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2016-10-04 14:23 UTC by Nikos Charonitakis
Modified: 2018-05-24 19:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
screenshot of saved report showing the corrupted characters (5.25 KB, image/png)
2016-10-04 14:23 UTC, Nikos Charonitakis
Details
status bar "file last modified date" also with corrupted Greek characters (3.15 KB, image/png)
2016-10-05 09:18 UTC, Nikos Charonitakis
Details

Description Nikos Charonitakis 2016-10-04 14:23:43 UTC
Created attachment 336905 [details]
screenshot of saved report showing the corrupted characters

I saved a pdf report with a greek filename and greek filename characters converted in something unreadable. I think that i have see the same problems at least since 2.6.11.
Comment 1 John Ralls 2016-10-04 14:27:57 UTC
What version of Windows, what locale settings, and what was the original filename?
Comment 2 Nikos Charonitakis 2016-10-05 09:18:52 UTC
Created attachment 336968 [details]
status bar "file last modified date" also with corrupted  Greek characters

may be a related problem
Comment 3 Nikos Charonitakis 2016-10-05 09:23:07 UTC
2 machines tested both with windows 10, one has Greek UI the other is with English UI, both had the problem i reported. Filename in screenshot example is: "Έξοδα στο χρόνο" ("Expense over time")
Comment 4 John Ralls 2016-10-05 14:41:44 UTC
Hmm. It's taking the UTF-8 string and presenting it as CP-1253 (the old Windows 3/Me way to display Greek). To demonstrate, visit http://string-functions.com/encodedecode.aspx, paste Έξοδα στο χρόνο into the box, select "utf-8" for "Encode with" and "Windows-1253" for "Decode with" then press the "Encode/Decode" button. It will display the same string as your first screen shot.

What's strange about that is that Windows Explorer should be using UTF-16 and the Gtk output functions should be taking care of converting the UTF-8 used internally to UTF-16. The only place code pages should be involved is in a CMD window.

The corrupt date string is even stranger. What's Greek for "Tuesday, October 4, 2016 at 4:15PM"?
Comment 5 Nikos Charonitakis 2016-10-06 12:33:05 UTC
I 'll try to find and install an old GnuCash windows version that worked properly so we can see in which version problem begins.

"Τρίτη, 4 Οκτωβρίου, 2016 στις 4:15μμ" (Greek date string you asked)
Comment 6 John Ralls 2016-10-07 11:06:00 UTC
You'll find all of the old releases going back to 2.2.0 (which probably won't be able to open your file) at https://sourceforge.net/projects/gnucash/files/gnucash%20%28stable%29/.
Comment 7 Nikos Charonitakis 2016-10-08 07:00:01 UTC
Tested 2.6.1, 2.6.7 and problem it was there since 2.6.1  I had the impression that earlier versions were unaffected but it prooved not to be the case.
From memory, i think that in 2.4.x or 2.2.x corrupted Greek characters also appeared  in Gnucash tab title but this was resolved at some point.
Comment 8 John Ralls 2016-10-08 07:59:18 UTC
Are you able to test on other versions of Windows?
Comment 9 Nikos Charonitakis 2016-10-08 16:03:16 UTC
-I tested also 2.6.3, date in the "file last modified" works as expected. Save as Greektittle.pdf corrupts characters.

-I tested 2.4.15, there is no save as pdf option in menu. Title in extract as html works ok with Greek charactets.
I can test other versions too, do you have any request?
Comment 10 John Ralls 2016-10-08 20:14:02 UTC
Did you check file last modified in 2.6.7? Oh, and do you mean File>Export as PDF or the "Save as PDF" from the print dialog?
Comment 11 Nikos Charonitakis 2016-10-12 18:28:53 UTC
I checked 2.6.7 and "last modified" is not working properly. Both File>Export and "Save as Pdf" produce the same error...
Comment 12 John Ralls 2016-10-23 20:44:21 UTC
On debugging I find that GLib thinks that on Windows the Greek Locale is not-UTF-8 so g_date_time_format_locale() passes the UTF-8 strings retrieved from el.mo through g_locale_to_utf8() which naturally munges the encoding.

Reassigning to GLib.
Comment 13 John Ralls 2016-10-23 21:09:19 UTC
Notes for pursuing this:

The documentation for g_get_charset, which is what g_date_time_format uses to decide whether the locale is UTF8 or not, always returns the old-fashioned codepage in effect for the terminal, so its use on Windows will be wrong for anything other than a terminal application. Moreover, in the case of g_date_time_format, where the strings are coming from UTF-8-encoded message catalogs the transcoding is inherently wrong.

I haven't done similar debugging for the filename case, but GnuCash uses GtkPrintOperation to do the printing. In that case the situation is reversed: The code should be transcoding (ideally to UTF-16 rather than the local codepage) but is apparently passing the UTF-8-encoded string directly to the filesystem. The filesystem, seeing single-byte characters, interprets the file name as being encoded in the current code page.
Comment 14 Philip Withnall 2017-09-13 12:18:08 UTC
(In reply to John Ralls from comment #13)
> The documentation for g_get_charset, which is what g_date_time_format uses
> to decide whether the locale is UTF8 or not, always returns the
> old-fashioned codepage in effect for the terminal, so its use on Windows
> will be wrong for anything other than a terminal application. Moreover, in
> the case of g_date_time_format, where the strings are coming from
> UTF-8-encoded message catalogs the transcoding is inherently wrong.

That’s bug #782578.
Comment 15 GNOME Infrastructure Team 2018-05-24 19:08:50 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1209.