Bug 662057 – Slow saving of sparse file to ODF

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 662057 - Slow saving of sparse file to ODF


Summary:	Slow saving of sparse file to ODF


Status:	RESOLVED FIXED

Product:	Gnumeric
Classification:	Applications
Component:	import/export OOo / OASIS
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Andreas J. Guelzow
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-10-18 02:19 UTC by Andreas J. Guelzow
Modified:	2013-04-22 21:04 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
sparse sample file (5.12 KB, application/gnumeric) 2011-10-18 02:19 UTC, Andreas J. Guelzow		Details
Preliminary patch (5.12 KB, patch) 2012-10-14 19:32 UTC, Morten Welinder	none	Details \| Review

Description Andreas J. Guelzow 2011-10-18 02:19:17 UTC

Created attachment 199292 [details]
sparse sample file

The attached file has four cells with content: A1, A1048576, XFD1, and
XFD1048576.

Saving this file as .gnumeric is virtually instantaneous. Saving this file to ODF takes virtually forever.

Comment 1 Morten Welinder 2012-10-13 14:58:30 UTC

xlsx had the identical problem which I fixed a week ago, see bug 685530.

The problem here is odf_write_content_rows.  It looks every possible cell
in the extent -- for A1::XFD1048576 that's an insane 16G cells.

The good news is that it's solvable.

1. We need all_cells like in xlsx_write_cells.

2. Since ods seems to require a table-row even if empty -- is that true? --
   we cannot make use of the boring_count trick.  Just assume all entries are
   zero.

3. Using all_cells we can instantly know if the working row has any cells
   whatsoever.  I don't know the code well enough to know if we must look
   for (non-existing) cells covered by merges.

4. To figure out whether anything in the row has a style that is not the
   default col-style, we can use the count_default_rows trick.

5. If there are no cells and no non-default styles in the row, skip to the
   next row in a hurry.

That should be enough, but the cake can be iced

6. Acquire all style for the entire row in one go, not in 16384 pieces.
   We have sheet_style_get_row for that, although a slightly more convenient
   wrapper might be called for.

Comment 2 Andreas J. Guelzow 2012-10-13 15:43:11 UTC

re 2, odf requires table-row even if empty but it has a table:number-rows-repeated, so that for 100 consecutive empty rows it only needs a single table-row. We currently do that for the region before and after the extent. 

re 3, odf has table:covered-table-cell to mark cells covered but this is just a placeholder. So if the row has only empty cells with default style we do not need to specify any row records, covered or not.

Comment 3 Morten Welinder 2012-10-14 13:24:10 UTC

1. is done.

Comment 4 Morten Welinder 2012-10-14 19:32:09 UTC

Created attachment 226422 [details] [review]
Preliminary patch

This mostly fixes the problems.  Issues:

1. Debug output.
2. We need to prescan all objects for a sheet.
3. This makes no attempts to merge empty rows into one with a repeat count.

Comment 5 Morten Welinder 2012-10-15 13:48:20 UTC

Fixed.  Save is sub-second now.

Leaving open for one thing: with this in place, do we need all the logic
searching for an empty top and/or bottom?

Comment 6 Andreas J. Guelzow 2012-10-15 14:22:57 UTC

There is no reason I can see to retain the separate top/bottom search.

Comment 7 Morten Welinder 2012-10-16 00:13:11 UTC

Andreas, could I ask you to have a go and simplifying that code?  I am not
sure what is going on with print ranges and stuff there.

Comment 8 Andreas J. Guelzow 2012-10-16 00:33:57 UTC

Yes I can do that. (I am currently buried in work, so it may take a while, but this does not seem to be extremely urgent.)

Comment 9 Andreas J. Guelzow 2013-04-22 21:04:52 UTC

I can't find any unneeded logic for empty top and/or bottom? anymore:

odf_write_sheet just handles the column header and then calls odf_write_content_rows up to 3 times to wrap the table-header-rows correctly.

odf_write_content_rows does not search for starting/ending empty rows.

So this seems to be done already.

This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.