GNOME Bugzilla – Bug 662057
Slow saving of sparse file to ODF
Last modified: 2013-04-22 21:04:52 UTC
Created attachment 199292 [details] sparse sample file The attached file has four cells with content: A1, A1048576, XFD1, and XFD1048576. Saving this file as .gnumeric is virtually instantaneous. Saving this file to ODF takes virtually forever.
xlsx had the identical problem which I fixed a week ago, see bug 685530. The problem here is odf_write_content_rows. It looks every possible cell in the extent -- for A1::XFD1048576 that's an insane 16G cells. The good news is that it's solvable. 1. We need all_cells like in xlsx_write_cells. 2. Since ods seems to require a table-row even if empty -- is that true? -- we cannot make use of the boring_count trick. Just assume all entries are zero. 3. Using all_cells we can instantly know if the working row has any cells whatsoever. I don't know the code well enough to know if we must look for (non-existing) cells covered by merges. 4. To figure out whether anything in the row has a style that is not the default col-style, we can use the count_default_rows trick. 5. If there are no cells and no non-default styles in the row, skip to the next row in a hurry. That should be enough, but the cake can be iced 6. Acquire all style for the entire row in one go, not in 16384 pieces. We have sheet_style_get_row for that, although a slightly more convenient wrapper might be called for.
re 2, odf requires table-row even if empty but it has a table:number-rows-repeated, so that for 100 consecutive empty rows it only needs a single table-row. We currently do that for the region before and after the extent. re 3, odf has table:covered-table-cell to mark cells covered but this is just a placeholder. So if the row has only empty cells with default style we do not need to specify any row records, covered or not.
1. is done.
Created attachment 226422 [details] [review] Preliminary patch This mostly fixes the problems. Issues: 1. Debug output. 2. We need to prescan all objects for a sheet. 3. This makes no attempts to merge empty rows into one with a repeat count.
Fixed. Save is sub-second now. Leaving open for one thing: with this in place, do we need all the logic searching for an empty top and/or bottom?
There is no reason I can see to retain the separate top/bottom search.
Andreas, could I ask you to have a go and simplifying that code? I am not sure what is going on with print ranges and stuff there.
Yes I can do that. (I am currently buried in work, so it may take a while, but this does not seem to be extremely urgent.)
I can't find any unneeded logic for empty top and/or bottom? anymore: odf_write_sheet just handles the column header and then calls odf_write_content_rows up to 3 times to wrap the table-header-rows correctly. odf_write_content_rows does not search for starting/ending empty rows. So this seems to be done already. This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.