GNOME Bugzilla – Bug 722470
Could libgsf support the full range of zlib compression levels on output, please?
Last modified: 2014-04-17 23:36:47 UTC
Created attachment 266599 [details] [review] patch to fix this problem Currently (version 1.14.9) if you try to set the property named "compression-level" (via gsf_outfile_new_child_full) to a value other than 0 or 8 you get the message "Unsupported compression level". I believe this is based on a confusion between the "level" and "method" arguments to the zlib function deflateInit2. As regards "method", zlib only supports Z_DEFLATED (== 8), but as regards "level", the values 0 to 9 are supported. And it would cost essentially nothing to pass on this support to users of libgsf. Use case: in effect, as regards zlib "level", properly speaking, libgsf only supports Z_DEFAULT_COMPRESSION (== 6), which is passed as a hard-wired value to deflateInit2. But this is a relatively expensive (time-consuming) compression level for big data. It would be nice to have the option of setting compression level 1, for example, which takes about half the time on output but which produces a file that's only about 10% bigger -- a good trade-off in some cases. This is a bit confusing but hopefully the attached patch will give the idea. One might want to rename some internal variables but I've kept my diff to the minimum. The basic notion is to let the "compression-level" property actually represent the level and not the method.
libgsf version: I meant 1.14.29, not 1.14.9.
Created attachment 267146 [details] [review] makes "compression-level" property govern compression level
I'd like to reframe this issue as a bug, and I'm attaching an improved patch. Here's why I want to call it a bug: the property of GsfOutfileZip named "compression-level" does not set the compression level, it sets the compression method. You have to examine the code in gsf-outfile-zip.c to see this, and it's confusing to users of the API. In principle the fix could go in either of two ways: rename the property in question as "compression-method", or make it govern the compression level as advertised. I'm proposing the latter. The former would not be very useful at present because GsfOutfileZip in fact only supports one compression method (besides null compression), namely zlib's "Deflate". This is because GsfOutfileZip relies on zlib for compression and current zlib only supports Deflate. If in future zlib adds support for more methods, then it may be worthwhile to introduce a "compression-method" property for GsfOutfileZip. Meanwhile, what would be useful is the ability to set the zlib Deflate "level". Valid values for this are in the range 0 to 9, although a value of -1 (Z_DEFAULT_COMPRESSION) is accepted as an argument to zlib's deflateInit2 (where it translates to a level of 6). In particular, users of the libgsf API may want to see the Deflate level to a smaller positive value than the default of 6, to get appreciably faster compression of big data at very little cost in terms of file size. So my patch makes some minor changes to the internals such that "compression-level" really means compression level. Is this backward-incompatible? I'd say, not in any significant way. At present you can only set the so-called "compression-level" to 0 (no compression) or 8 (Deflate compression, the default); any other value generates a warning and is ignored. Under my proposal, setting a value of 0 will have the same effect as before. Setting a value of 8 would have a different effect: instead of just confirming the use of Deflate (at level 6, in fact), it would actually set a Deflate level of 8. But it seems unlikely that anyone is explicitly setting a value of 8 at present -- why would they bother when it's the default? Besides, I think the benefit of being able to set the compression level properly outweights the (possible but unlikely) cost of some users getting Deflate level 8 when they expected to get 6.
The issues here are... 1. What is the use case for this? What kind of files are you stuffing into zip archives? How big are they? 2. Compatibility bothers me. It's really hard to know if people are setting all options, even default ones. And suddenly and unexpectedly getting a level 8 might be quite slow.
I agree, btw., that the level/method naming is a bit of a mess here. The GsfOutfileZip property called "compression-level" really selects a zip file method. We support only two of the possible values at present. We're stuck with the property name, but I guess we could add another property as a parameter to the method. "Stored" would ignore that, and DEFLATED would treat it as a zlib compression level.
(In reply to comment #4) > The issues here are... > > 1. What is the use case for this? What kind of files are you stuffing into > zip archives? How big are they? Socio-economic survey data. For example, 60000 observations on 1200 variables; when stored as doubles that's an uncompressed size of about 550 MB. At zlib level 1 the data are compressed to about 10 percent of the original size in a save time of under 4 seconds on Core i7. At level 6 the output is about 7 percent of the original size but compression takes over three times longer. > 2. Compatibility bothers me. It's really hard to know if people are > setting all options, even default ones. And suddenly and unexpectedly > getting a level 8 might be quite slow. I'd be very surprised if people have correct expectations of the effect of setting the current "compression-level" to a positive value since it doesn't do what's claimed and you have to read the source code to figure out what it actually does. However, if that were really a concern one could add a property named, say, "zlib-compression-level" that does what's done in my patch. Alternatively, one could ensure backward compatibility by mapping 8 to 6 in input to the "compression-level" setting. This would obviously be a hack but we're talking about a property that's just wrong at present so correct operation plus perfect compatibility is not an option.
(In reply to comment #5) > We're stuck with the property name, but I guess we could add another > property as a parameter to the method. "Stored" would ignore that, > and DEFLATED would treat it as a zlib compression level. That sounds promising.
Could I nudge this a little? It would be very useful to be able to set the deflate (zlib) level. Would you be open to a revised patch that adds a property named, say, "deflate_level"? (This would leave "compression_level" as is.)
An updated patch with a separate property would make this go through fast.
OK, new patch coming up. I've made one (purely internal) change to the existing PROP_COMPRESSION_LEVEL property, relabeling it as PROP_COMPRESSION_METHOD. This is just in the interest of comprehensibility of the code.
Created attachment 274445 [details] [review] adds "deflate-level" property to gsf-outfile-zip.c
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.