GNOME Bugzilla – Bug 724399
xls export hangs
Last modified: 2014-02-17 17:16:23 UTC
Created attachment 269166 [details] sample gnumeric file If I save the attached file as an xls file (MS Exceltm 97/2000/XP & 5.0/95) the export hangs. Interupting it always appears to result in the same location:
+ Trace 233177
The obvious guess is that something is going wrong with "into Back Corté"
It's the Excel95 saving that goes crazy. It can be triggers like this: ./ssconvert -T Gnumeric_Excel:excel_biff7 ~/Download/tango-gold.gnumeric /tmp/ttt.xls
Note that this problem does not occur with Gnumeric 1.12.6. If I recall correctly then since 1.12.6 the string export to xls was changed.
The problem arose with the patch for bug 715110. Jean: (gdb) p avail $18 = 1 (gdb) p offset $19 = 0 (gdb) p out_bytes $20 = 17 (gdb) p tmp $21 = 0x637ea0 "\351~c" (gdb) p tmp-10 $22 = 0x637e96 " Back Cort\351~c" When we do... avail = MIN (avail, output_len); avail = (avail - offset) / 2 * 2 + offset; /* we need to export an even byte number */ ...we end up with avail==0 and see no progress. Jean: where did "we need to export an even byte number" come from? Does it apply to pre-BIFF8?
I don't remember, as usual, not enough comments. May be it's just wrong.
Created attachment 269224 [details] [review] Proposed patch May be we need to have an even number of chars only if we split the string and never for the last run.
What we need is a test suite covering strings. From a QA point-of-view we shouldn't be able to make this kind of mistake. I think we can do this by round-trip testing: gnumeric -> biff7 -> gnumeric gnumeric -> biff8 -> gnumeric gnumeric -> ods -> gnumeric gnumeric -> xlsx -> gnumeric ...using a file containing (at least) 1. Simple short ascii strings. Odd and even lengths. 2. A very long (say, 10k characters) ascii string. 3. A short non-ascii string. 4. A very long non-ascii string. 5. A very long non-ascii string with the first (say) 4k being ascii. 6. Ditto with the final 4k characters. 7. An empty string 8. A selection of far-east language strings. 9. Non-ascii strings with ISO-8859-1 representation. Odd and even lengths.
Tests are in: t6500-strings.pl Without patch, we get a hang. With patch, we get criticals: | (/home/welinder/gnome/gnumeric/src/.libs/lt-ssconvert:18729): gnumeric:read-WA RNING **: File is most likely corrupted. | (Condition "q->length - 8 >= in_len" failed in excel_read_LABEL.) These appear to relate to all the long strings.
I am tempted to release a new version today with or without fixes for the long-string issue. I don't think that issue is new.
not new, but a bit critical, at least for people using .xls. We probably also need to adjust the saved length (just guessing).
The majority of the biff7 streams we write are never read. They sit in dual-format streams and only the biff8 is ever read. The hang is much, much worse.
LOcalc correctly imports the xls file we export with the long strings, so this is an issue with our import code. I'd vote for applying the patch, and fix the read code when possible.
The problem might be on the read side, i.e., we may not handle BIFF_LABEL_v2 with continuation records right. LO can read the files we produce, but truncates the strings at ~2k characters. I think they fails to consider multiple continuation records. I'll have to check if XL can read our files. Biff read code 0x204, length 8 Opcode 0x204 length 8 malloced? 0 Data: 0 | 05 00 01 00 17 00 10 27 XX XX XX XX XX XX XX XX | .......'******** (/home/welinder/gnome/gnumeric/src/.libs/lt-ssconvert:21777): gnumeric:read-WARN ING **: File is most likely corrupted. (Condition "q->length - 8 >= in_len" failed in excel_read_LABEL.) Biff read code 0x3c, length 2078 Opcode 0x3c length 2078 malloced? 0 Data: 0 | 61 62 63 64 61 62 63 64 61 62 63 64 61 62 63 64 | abcdabcdabcdabcd 10 | 61 62 63 64 61 62 63 64 61 62 63 64 61 62 63 64 | abcdabcdabcdabcd 20 | 61 62 63 64 61 62 63 64 61 62 63 64 61 62 63 64 | abcdabcdabcdabcd 30 | 61 62 63 64 61 62 63 64 61 62 63 64 61 62 63 64 | abcdabcdabcdabcd
Patch pushed.
Excel is happy with the xls we create, so LO is buggy too.
The even byte number is a necessity in the unicode case. But not in other cases.
The read side of v7 labels has been fixed. Let's consider this fixed. Any fallout from the tests should get their own bugs.