GNOME Bugzilla – Bug 535473
Manual specification of encoding for .XLS
Last modified: 2011-08-14 22:23:14 UTC
Please describe the problem: Ssconvert is converting xls file from biff7 to biff8, encoding in source file is cp1251, cyrillic. Resulting file can be opened with open office, still encoding in it is wrong, string looks like "ÀÊÒ ÑÂÅÐÊÈ ÐÀÑ×ÅÒÎÂ" instead of "АКТ СВЕРКИ РАСЧЕТОВ". Steps to reproduce: 1. ssconvert -T Gnumeric_Excel:excel_biff8 a.xls b.xls 2. 3. Actual results: in console: Reading file:****.xls Excel 95 Writing file:****.xls in resulting file encoding is changed (compared with source file) Expected results: resulting file encoding should be the same as at source file Does this happen every time? yes Other information:
Created attachment 111727 [details] source file
Created attachment 111728 [details] resulting xls file
Interesting. The source file is missing a CODEPAGE record and contains no metadata. What generated this ? There is indeed a case to be made for allowing manual specification of the import encoding for xls here. The trick would be that XL operates on codepages and we focus on locale.
(In reply to comment #3) > Interesting. The source file is missing a CODEPAGE record and contains no > metadata. What generated this ? > > There is indeed a case to be made for allowing manual specification of the > import encoding for xls here. The trick would be that XL operates on codepages > and we focus on locale. > I've received this by e-mail, old version of office,I presume, it is defenitely not auto-generated.
I suspect this file was generated by "1C Enterprise". FONT record has charset 0xCC (Cyrillic). Partial fix for #304007 seems to fix that one too (tested with attached file). New files generated by "1C" are not fixed in #304007 at the moment, because "1C" seems to screw their XLS generator.
Valek, would you please attach a file to this bug report that is not correctly readable after the recent changes (patches to bug #304007), ie. one of those new "1C" files.
Created attachment 193735 [details] File extracted from fdo#33100 Attached file was extracted from clipboard buffer attached to https://bugs.freedesktop.org/show_bug.cgi?id=33100 It was made by (most likely) newer version of "1C Enterprise" than one initially attached to #535473. No codepage, charsets are 0, font names do not have "Cyr".
I have added a file opener for xls-type files that require a codepage specification. The user can then specify the charset to be used. Note that this depends on a current goffice. This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.