After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 722470 - Could libgsf support the full range of zlib compression levels on output, please?
Could libgsf support the full range of zlib compression levels on output, ple...
Status: RESOLVED FIXED
Product: libgsf
Classification: Core
Component: ZIP
1.14.x
Other All
: Normal enhancement
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2014-01-18 03:18 UTC by Allin Cottrell
Modified: 2014-04-17 23:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch to fix this problem (1.80 KB, patch)
2014-01-18 03:18 UTC, Allin Cottrell
none Details | Review
makes "compression-level" property govern compression level (2.03 KB, patch)
2014-01-24 18:20 UTC, Allin Cottrell
none Details | Review
adds "deflate-level" property to gsf-outfile-zip.c (3.20 KB, patch)
2014-04-16 11:01 UTC, Allin Cottrell
none Details | Review

Description Allin Cottrell 2014-01-18 03:18:02 UTC
Created attachment 266599 [details] [review]
patch to fix this problem

Currently (version 1.14.9) if you try to set the property named "compression-level" (via gsf_outfile_new_child_full) to a value other than 0 or 8 you get the message "Unsupported compression level". I believe this is based on a confusion between the "level" and "method" arguments to the zlib function deflateInit2.
As regards "method", zlib only supports Z_DEFLATED (== 8), but as regards "level", the values 0 to 9 are supported. And it would cost essentially nothing to pass on this support to users of libgsf.

Use case: in effect, as regards zlib "level", properly speaking, libgsf only supports Z_DEFAULT_COMPRESSION (== 6), which is passed as a hard-wired value to 
deflateInit2. But this is a relatively expensive (time-consuming) compression level for big data. It would be nice to have the option of setting compression level 1, for example, which takes about half the time on output but which produces a file that's only about 10% bigger -- a good trade-off in some cases.

This is a bit confusing but hopefully the attached patch will give the idea. One might want to rename some internal variables but I've kept my diff to the minimum. The basic notion is to let the "compression-level" property actually represent the level and not the method.
Comment 1 Allin Cottrell 2014-01-18 03:21:17 UTC
libgsf version: I meant 1.14.29, not 1.14.9.
Comment 2 Allin Cottrell 2014-01-24 18:20:31 UTC
Created attachment 267146 [details] [review]
makes "compression-level" property govern compression level
Comment 3 Allin Cottrell 2014-01-24 18:21:52 UTC
I'd like to reframe this issue as a bug, and I'm attaching an improved
patch.

Here's why I want to call it a bug: the property of GsfOutfileZip
named "compression-level" does not set the compression level, it sets
the compression method. You have to examine the code in
gsf-outfile-zip.c to see this, and it's confusing to users of the API.

In principle the fix could go in either of two ways: rename the
property in question as "compression-method", or make it govern the
compression level as advertised. I'm proposing the latter. The former
would not be very useful at present because GsfOutfileZip in fact only
supports one compression method (besides null compression), namely
zlib's "Deflate". This is because GsfOutfileZip relies on zlib for
compression and current zlib only supports Deflate. 

If in future zlib adds support for more methods, then it may be
worthwhile to introduce a "compression-method" property for
GsfOutfileZip. Meanwhile, what would be useful is the ability to set
the zlib Deflate "level". Valid values for this are in the range 0 to
9, although a value of -1 (Z_DEFAULT_COMPRESSION) is accepted as an
argument to zlib's deflateInit2 (where it translates to a level of 6).
In particular, users of the libgsf API may want to see the Deflate
level to a smaller positive value than the default of 6, to get
appreciably faster compression of big data at very little cost in
terms of file size.

So my patch makes some minor changes to the internals such that
"compression-level" really means compression level.

Is this backward-incompatible? I'd say, not in any significant way.
At present you can only set the so-called "compression-level" to 0 (no
compression) or 8 (Deflate compression, the default); any other value
generates a warning and is ignored. Under my proposal, setting a value
of 0 will have the same effect as before. Setting a value of 8 would
have a different effect: instead of just confirming the use of Deflate
(at level 6, in fact), it would actually set a Deflate level of 8.
But it seems unlikely that anyone is explicitly setting a value of 8
at present -- why would they bother when it's the default? Besides, I
think the benefit of being able to set the compression level properly
outweights the (possible but unlikely) cost of some users getting
Deflate level 8 when they expected to get 6.
Comment 4 Morten Welinder 2014-02-13 18:25:45 UTC
The issues here are...

1. What is the use case for this?  What kind of files are you stuffing into
   zip archives?  How big are they?

2. Compatibility bothers me.  It's really hard to know if people are
   setting all options, even default ones.  And suddenly and unexpectedly
   getting a level 8 might be quite slow.
Comment 5 Morten Welinder 2014-02-13 18:37:08 UTC
I agree, btw., that the level/method naming is a bit of a mess here.
The GsfOutfileZip property called "compression-level" really selects
a zip file method.  We support only two of the possible values at
present.

We're stuck with the property name, but I guess we could add another
property as a parameter to the method.  "Stored" would ignore that,
and DEFLATED would treat it as a zlib compression level.
Comment 6 Allin Cottrell 2014-02-13 18:59:54 UTC
(In reply to comment #4)
> The issues here are...
> 
> 1. What is the use case for this?  What kind of files are you stuffing into
>    zip archives?  How big are they?

Socio-economic survey data. For example, 60000 observations on 1200 variables; when stored as doubles that's an uncompressed size of about 550 MB.

At zlib level 1 the data are compressed to about 10 percent of the original size in a save time of under 4 seconds on Core i7. At level 6 the output is about 7 percent of the original size but compression takes over three times longer. 

> 2. Compatibility bothers me.  It's really hard to know if people are
>    setting all options, even default ones.  And suddenly and unexpectedly
>    getting a level 8 might be quite slow.

I'd be very surprised if people have correct expectations of the effect of setting the current "compression-level" to a positive value since it doesn't do what's claimed and you have to read the source code to figure out what it actually does. However, if that were really a concern one could add a property named, say, "zlib-compression-level" that does what's done in my patch. Alternatively, one could ensure backward compatibility by mapping 8 to 6 in input to the "compression-level" setting. This would obviously be a hack but we're talking about a property that's just wrong at present so correct operation plus perfect compatibility is not an option.
Comment 7 Allin Cottrell 2014-02-13 19:02:54 UTC
(In reply to comment #5)
> We're stuck with the property name, but I guess we could add another
> property as a parameter to the method.  "Stored" would ignore that,
> and DEFLATED would treat it as a zlib compression level.

That sounds promising.
Comment 8 Allin Cottrell 2014-04-14 10:47:58 UTC
Could I nudge this a little? It would be very useful to be able to set the deflate (zlib) level. Would you be open to a revised patch that adds a property named, say, "deflate_level"? (This would leave "compression_level" as is.)
Comment 9 Morten Welinder 2014-04-14 12:39:54 UTC
An updated patch with a separate property would make this go through fast.
Comment 10 Allin Cottrell 2014-04-16 10:59:33 UTC
OK, new patch coming up. I've made one (purely internal) change to the existing PROP_COMPRESSION_LEVEL property, relabeling it as PROP_COMPRESSION_METHOD. This is just in the interest of comprehensibility of the code.
Comment 11 Allin Cottrell 2014-04-16 11:01:26 UTC
Created attachment 274445 [details] [review]
adds "deflate-level" property to gsf-outfile-zip.c
Comment 12 Morten Welinder 2014-04-17 23:36:47 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.