After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 304007 - troubles with some russian xls
troubles with some russian xls
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export MS Excel (tm)
1.4.x
Other All
: Normal normal
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2005-05-13 07:14 UTC by Ed
Modified: 2011-08-12 22:04 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
xls file example (8.00 KB, application/vnd.ms-excel)
2005-05-13 08:19 UTC, Ed
  Details
screenshot from MS Excel (15.79 KB, image/png)
2005-05-13 08:20 UTC, Ed
  Details
Modern example from '1C Enterprise' (4.50 KB, application/vnd.ms-excel)
2011-08-02 23:56 UTC, Valek Filippov
  Details
This patch adds setting Codepage based on charset field in FONT record (3.02 KB, patch)
2011-08-04 22:27 UTC, Valek Filippov
none Details | Review
Patch with fixed codepage for Apple Roman encoding (3.02 KB, patch)
2011-08-05 12:51 UTC, Valek Filippov
none Details | Review
Multilingual document (13.50 KB, application/vnd.ms-excel)
2011-08-06 16:24 UTC, Urmas
  Details
How it should look (14.44 KB, application/pdf)
2011-08-06 16:27 UTC, Urmas
  Details
This one seems to handle 'charsets.xls' properly. (15.05 KB, patch)
2011-08-06 22:08 UTC, Valek Filippov
none Details | Review
With NEWS and ChangeLog (16.15 KB, patch)
2011-08-11 20:41 UTC, Valek Filippov
committed Details | Review
As per discussion on IRC, store codepage, convert charset in read_FONT, call gnm_font_override_codepage for charset 0. (8.50 KB, patch)
2011-08-12 21:27 UTC, Valek Filippov
committed Details | Review

Description Ed 2005-05-13 07:14:03 UTC
Distribution: Debian 3.1
Package: Gnumeric
Severity: normal
Version: GNOME2.8.3 1.4.x
Gnome-Distributor: Debian
Synopsis: troubles with some russian xls
Bugzilla-Product: Gnumeric
Bugzilla-Component: import/export MS Excel (tm)
Bugzilla-Version: 1.4.x
BugBuddy-GnomeVersion: 2.0 (2.8.1)
Description:
Description of the crash:
Files, created in 1C program (it's very popular russian program), are
viewed with bad charset. 

Steps to reproduce the crash:
1. Export some data from 1C in xls fromat
2. Open this file in Gnumeric
3. All cyrillic letters are bad (like Western charset instead of
cp-1251).
4. After some work with this file Gnumeric crash

Expected Results:


How often does this happen?


Additional Information:
Microsoft Excel open this files without any problem.
i attach file example.


Debugging Information:

Backtrace was generated from '/usr/bin/gnumeric'

(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(no debugging symbols found)
`system-supplied DSO at 0xffffe000' has disappeared; keeping its
symbols.
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1222850496 (LWP 21564)]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
0xffffe410 in __kernel_vsyscall ()

Thread 1 (Thread -1222850496 (LWP 21564))

  • #0 __kernel_vsyscall
  • #1 __waitpid_nocancel
    from /lib/tls/i686/cmov/libpthread.so.0
  • #2 libgnomeui_module_info_get
    from /usr/lib/libgnomeui-2.so.0
  • #3 <signal handler called>
  • #4 gtk_tree_model_get_valist
    from /usr/lib/libgtk-x11-2.0.so.0
  • #5 gtk_tree_model_get
    from /usr/lib/libgtk-x11-2.0.so.0
  • #6 editable_label_start_editing
  • #7 g_cclosure_marshal_VOID__VOID
    from /usr/lib/libgobject-2.0.so.0
  • #8 g_closure_invoke
    from /usr/lib/libgobject-2.0.so.0
  • #9 g_signal_emit_by_name
    from /usr/lib/libgobject-2.0.so.0
  • #10 g_signal_emit_valist
    from /usr/lib/libgobject-2.0.so.0
  • #11 g_signal_emit
    from /usr/lib/libgobject-2.0.so.0
  • #12 gtk_tree_selection_unselect_all
    from /usr/lib/libgtk-x11-2.0.so.0
  • #13 gtk_tree_view_set_search_equal_func
    from /usr/lib/libgtk-x11-2.0.so.0
  • #14 _gtk_tree_view_queue_draw_node
    from /usr/lib/libgtk-x11-2.0.so.0
  • #15 gtk_tree_view_get_type
    from /usr/lib/libgtk-x11-2.0.so.0
  • #16 _gtk_marshal_BOOLEAN__BOXED
    from /usr/lib/libgtk-x11-2.0.so.0
  • #17 g_cclosure_new_swap
    from /usr/lib/libgobject-2.0.so.0
  • #18 g_closure_invoke
    from /usr/lib/libgobject-2.0.so.0
  • #19 g_signal_emit_by_name
    from /usr/lib/libgobject-2.0.so.0
  • #20 g_signal_emit_valist
    from /usr/lib/libgobject-2.0.so.0
  • #21 g_signal_emit
    from /usr/lib/libgobject-2.0.so.0
  • #22 gtk_widget_send_expose
    from /usr/lib/libgtk-x11-2.0.so.0
  • #23 gtk_window_propagate_key_event
    from /usr/lib/libgtk-x11-2.0.so.0
  • #24 gtk_window_propagate_key_event
    from /usr/lib/libgtk-x11-2.0.so.0
  • #25 _gtk_marshal_BOOLEAN__BOXED
    from /usr/lib/libgtk-x11-2.0.so.0
  • #26 g_cclosure_new_swap
    from /usr/lib/libgobject-2.0.so.0
  • #27 g_closure_invoke
    from /usr/lib/libgobject-2.0.so.0
  • #28 g_signal_emit_by_name
    from /usr/lib/libgobject-2.0.so.0
  • #29 g_signal_emit_valist
    from /usr/lib/libgobject-2.0.so.0
  • #30 g_signal_emit
    from /usr/lib/libgobject-2.0.so.0
  • #31 gtk_widget_send_expose
    from /usr/lib/libgtk-x11-2.0.so.0
  • #32 gtk_propagate_event
    from /usr/lib/libgtk-x11-2.0.so.0
  • #33 gtk_main_do_event
    from /usr/lib/libgtk-x11-2.0.so.0
  • #34 _gdk_events_queue
    from /usr/lib/libgdk-x11-2.0.so.0
  • #35 g_main_depth
    from /usr/lib/libglib-2.0.so.0
  • #36 g_main_context_dispatch
    from /usr/lib/libglib-2.0.so.0
  • #37 g_main_context_dispatch
    from /usr/lib/libglib-2.0.so.0
  • #38 g_main_loop_run
    from /usr/lib/libglib-2.0.so.0
  • #39 bonobo_main
    from /usr/lib/libbonobo-2.so.0
  • #40 main
  • #0 __kernel_vsyscall





------- Bug moved to this database by unknown@bugzilla.gnome.org 2005-05-13 07:14 UTC -------


Bugreport had an attachment. This cannot be imported to Bugzilla.
Contact bugmaster@gnome.org if you are willing to write a patch for this.
The original reporter of this bug does not have
   an account here. Reassigning to the person who moved
   it here, unknown@bugzilla.gnome.org.
   Previous reporter was spied@yandex.ru.

Comment 1 Ed 2005-05-13 08:19:29 UTC
Created attachment 46387 [details]
xls file example
Comment 2 Ed 2005-05-13 08:20:16 UTC
Created attachment 46388 [details]
screenshot from MS Excel
Comment 3 Jody Goldberg 2005-05-14 22:31:06 UTC
What generated that file ?
XL-95 does not render it correctly.
2k/XP renders something similar to your screenshot, but can not export it and
reload the result.
OOo 2 does even worse than gnumeric.
Comment 4 Ed 2005-05-14 22:54:28 UTC
> What generated that file ?

1C Enterprise, it's very popular russian bookkeeping program (for windows).
i get some documents in this format via e-mail.

> XL-95 does not render it correctly.

what is wrong? afaik russian version of excel 95 must reder it correctly.

> 2k/XP renders something similar to your screenshot,

;)

> but can not export it and reload the result.

i don't understand you. for me - i can open this file in excel, do "save as" 
and open in excel or gnumeric..
Comment 5 Morten Welinder 2005-05-16 13:30:54 UTC
jody: while it is probably not the cause of the crash, it looks like
excel_read_XF should set ->text_dir in all cases.
Comment 6 Morten Welinder 2005-05-16 13:53:17 UTC
I fixed that issue.  With that, nothing from Purify.
Comment 7 Jody Goldberg 2005-05-20 19:44:03 UTC
XL95 encoding is mostly, but every cell has 'wrap text' enabled which renders
terribly.

XL2k/XP encoding looks correct everywhere, but if I save and reload in 2k or XP
the encoding and 'wrap text' is incorrect everywhere.
Comment 8 Urmas 2011-03-28 16:55:50 UTC
after six years 
bloat office 1 : 0 gnumeric
Comment 9 Valek Filippov 2011-08-02 23:56:53 UTC
Created attachment 193115 [details]
Modern example from '1C Enterprise'

'biff5' part of the CLP file was dumped from the file attached to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=33100

Both LibO Calc and Gnumeric fail to convert Cyrillic text in it.

Somehow Calc opens "123.xls" attached here correctly.
Comment 10 Valek Filippov 2011-08-03 15:31:35 UTC
"123.xls" uses "0xCC" ('Cyrillic') in the 'charset' field of the 'Font' record.
Could gnumeric use it?

In addition it would be nice to have a UI to select encoding for biff5 files like we have for text import and configuration option switch(es) to use Codepage record, or Charset from the Font, or force customer encoding etc.
Comment 11 Valek Filippov 2011-08-04 22:27:26 UTC
Created attachment 193281 [details] [review]
This patch adds setting Codepage based on charset field in FONT record
Comment 12 Valek Filippov 2011-08-05 12:51:56 UTC
Created attachment 193305 [details] [review]
Patch with fixed codepage for Apple Roman encoding
Comment 13 Valek Filippov 2011-08-06 00:07:40 UTC
https://bugzilla.gnome.org/show_bug.cgi?id=535473 is similar to this one and  fixed (for old 1C files) by patch in #12.
Comment 14 Andreas J. Guelzow 2011-08-06 13:33:35 UTC
Review of attachment 193305 [details] [review]:

Please correct me if I am wrong, but it looks to me like you that the patch uses the font information to set the code page even if the file contains codepage information. In the case of a codepage record, that information has to govern. 

If we are guaranteed that the codepage record would come after the font information we are fine, otherwise we might replacing valid codepage info with the guess from the font information.
Comment 15 Valek Filippov 2011-08-06 13:57:55 UTC
Yes, you are right. That's the reason why I've asked how to get codepage value from read_FONT.
We are not guaranteed that codepage come after, in fact it seems to come before fonts in normal files.
I think the better way would be to store charset within the font and utilize it later.
Comment 16 Urmas 2011-08-06 16:23:29 UTC
Because Excel 5 format is multilingual, the codepage from the font should override codepage record. See example.
Comment 17 Urmas 2011-08-06 16:24:27 UTC
Created attachment 193351 [details]
Multilingual document
Comment 18 Urmas 2011-08-06 16:27:32 UTC
Created attachment 193352 [details]
How it should look
Comment 19 Valek Filippov 2011-08-06 17:58:24 UTC
(In reply to comment #16)
> Because Excel 5 format is multilingual, the codepage from the font should
> override codepage record. See example.

No, it shouldn't.

Current patch will override codepage with charset from every next font entry.
Font entries are grouped together in the 'workbook' substream, so before you start to deal with text codepage would be set from the charset in the last font record.
It works ok if a document could be handled as "one encoding per document".
Your document _sets_ codepage (hence I guess it's not one generated by "1C") to 1251 and mixes Greek, CE and Turkish font records, but it ends up with Russian font record.
So, end result would be "everything is Latin I + Cyrillic".
Comment 20 Valek Filippov 2011-08-06 22:08:33 UTC
Created attachment 193360 [details] [review]
This one seems to handle 'charsets.xls' properly.

Read and store 'charset' from FONT record, use it for g_iconv from LABEL record.
Comment 21 Andreas J. Guelzow 2011-08-11 06:17:18 UTC
NEWS and Changelog entries for the last patch?
Comment 22 Valek Filippov 2011-08-11 20:41:59 UTC
Created attachment 193662 [details] [review]
With NEWS and ChangeLog
Comment 23 Andreas J. Guelzow 2011-08-11 22:42:04 UTC
I have committed the last patch. What is left from this bug report?
Comment 24 Valek Filippov 2011-08-11 22:59:39 UTC
UI to force encoding.
Also it would be nice to have a command line option for ssconvert.
Comment 25 Andreas J. Guelzow 2011-08-11 23:09:33 UTC
UI to specify encoding is bug #535473. Command line option for ssconvert would be closely related to that (so I would also consider it part of bug #535473).

So we do not need two such bugs, closing this one.
Comment 26 Valek Filippov 2011-08-12 21:27:35 UTC
Created attachment 193731 [details] [review]
As per discussion on IRC, store codepage, convert charset in read_FONT, call gnm_font_override_codepage for charset 0.

This one seems to convert file from ubuntu#262777 properly.
Comment 27 Andreas J. Guelzow 2011-08-12 21:35:25 UTC
Review of attachment 193731 [details] [review]:

We should not call gnm_font_override_codepage twice but remember the result from the first call.
Comment 28 Andreas J. Guelzow 2011-08-12 22:04:19 UTC
Review of attachment 193731 [details] [review]:

committed with minor changes