GNOME Bugzilla – Bug 684842
Seeks on GMemoryOutputStream don't have opaque semantics
Last modified: 2013-10-23 15:33:59 UTC
Created attachment 225188 [details] [review] Change semantics of seek on memory output stream Problem: Currently the seek have very strange behaviour - whether it succeeds or fails depend on size of INTERNAL buffer. Take the following code: try { GLib.MemoryOutputStream str = new MemoryOutputStream(new uint8[2], GLib.realloc, GLib.free); str.write(buf); str.seek(1, SeekType.CUR); } catch(GLib.Error error) { stderr.printf("ERROR\n"); } Question - will the code succeeds? - If buf.length < 2 it succeeds - If buf.length == 2 it fails (!!!) - If buf.length > 2 && buf.length < MIN_ARRAY_SIZE it succeeds - If buf.length == (MIN_ARRAY_SIZE << k) it fails (!!!) (I.e. 16, 32, 64, 128, ...) - If buf.length > 2 && buf.length != (MIN_ARRAY_SIZE << k) it succeeds This makes seeking on memory output streams non-trivial as it does not have uniform behaviour. In similar way the seek from the end taks the current data length so to use it user needs to know the size. This is slightly better because the querying the data is easier while the rellocation cannot be done. Possible solution: Make the reallocable memory output streams mirror the behaviour of resizable files while non-reallocable have the old behaviour of fixed size files. Notes: - It changes the behaviour of API. However given that API behaviour was not documented and it depended on gio internals (marked for improve anyway) I don't think this would be big problem. - I know 2.34 branch have not been created and 2.34 is frozen (posted for future branch)
Created attachment 225189 [details] [review] Change semantics of seek on memory output stream Ups. I added -a when I changed the commit message.
related to bug 681374
Ping. Any chance for review?
I think this patch is a step in the right direction but it needs a bit more thought, attention to detail, test cases and documentation. I think the correct thing to do here is to cleanly split our semantics for this stream into two separate cases: those where we have the realloc function and those where we do not. In the realloc case I think that the semantics should exactly mirror those of dealing with a normal file in POSIX, where the seek pointer starts at zero. There is a reasonable question to be asked about the 'initial state' of the file. Do we ignore the size of the originally-given buffer and treat the file as empty except for the parts that we wrote or do we treat the original buffer as the original content of the file (ie: what happens if we seek to the end immediately after creating the stream?). I think if we have a realloc function, the best thing to do is to ignore the original allocated size and act like "empty file". In the non-realloc case, I think we should treat it as a block device of a set size. In this case, seeking to the end would definitely mean that we seek to the end (which is in contrast to my reasoning above).
Note: seek-past-end on block devices is an EINVAL. I'm going to try to do this, assuming Dan agrees that this is a reasonable way forward.
(In reply to comment #4) > I think this patch is a step in the right direction but it needs a bit more > thought, attention to detail, test cases and documentation. In particular, g_seekable_seek() needs to clarify what happens when you seek out of bounds, in both input and output streams, resizable and fixed, and then we need to make everything agree with that. > There is a reasonable question to be asked about the 'initial state' of the > file. Do we ignore the size of the originally-given buffer and treat the file > as empty except for the parts that we wrote or do we treat the original buffer > as the original content of the file (ie: what happens if we seek to the end > immediately after creating the stream?). I think if we have a realloc > function, the best thing to do is to ignore the original allocated size and act > like "empty file". I think the critical point here is that other than the initial buffer size, the caller does not control the size of the buffer; GMemoryOutputStream itself decides how much to expand it, and when. So I think that definitely says that the buffer size is just an implementation detail, not part of the data, and so seeking to the end should move to priv->valid_len, not priv->len. > In the non-realloc case, I think we should treat it as a block device of a set > size. In this case, seeking to the end would definitely mean that we seek to > the end (which is in contrast to my reasoning above). Agreed. So yes, I think this is the right way forward.
Created attachment 257872 [details] [review] GSeekable: document seek-past-end semantics Introduce the concept of "fixed" vs. "resizable" streams and document how g_seekable_seek() works for each case. We don't include g_seekable_is_fixed_size() at this point because we don't know if anyone would require it. This may appear in the future if someone asks for it, however.
Created attachment 257873 [details] [review] Change semantics of seek on memory output stream
Created attachment 257874 [details] [review] GMemoryInputStream: improve seek tests Improve test coverage for testing seeking on fixed vs. resizable GMemoryOutputStream.
Created attachment 257875 [details] [review] GMemoryOutputStream: docs and whitespace fixes Document the difference between resizable and fixed-sized streams, particularly with regards to sizing and seeking.
Comment on attachment 257873 [details] [review] Change semantics of seek on memory output stream >From: Maciej Piechotka <uzytkownik2@gmail.com> did you want to --reset-author that? >Subject: [PATCH] Change semantics of seek on memory output stream > >https://bugzilla.gnome.org/show_bug.cgi?id=684842 needs more commit message One edge case bug: if you create the stream with an initial buffer that isn't zero-filled, and you seek past part of that buffer and then start writing again, the skipped bytes don't get zeroed out.
Comment on attachment 257874 [details] [review] GMemoryInputStream: improve seek tests basically good I think, except that you said "GMemoryInputStream" rather than "GMemoryOutputStream" in the commit summary
(In reply to comment #11) > One edge case bug: if you create the stream with an initial buffer that isn't > zero-filled, and you seek past part of that buffer and then start writing > again, the skipped bytes don't get zeroed out. I'm not sure this is a bug (but I'm not sure that it's not). As I mentioned on IRC earlier, _get_data_size() and _steal_as_bytes() will be working somewhat oddly with respect to fixed-size streams under our new understanding of them as block devices. In POSIX if I had a block device that contained some junk before I started using it, then I did seek() and write(), I wouldn't expect the junk to be covered by zeros... Because the definition of _get_data_size() and _steal_as_bytes() are so much tied up in terms of "up to the last byte written", I'm not sure what else we can do here... the obvious alternative is to memset() the entire buffer when we receive it, but that doesn't really seem right either. I guess your idea is that we should zero-out the bytes from valid_len to pos when doing a write if we discover that pos > valid_len? This is very oddly side-effecty, imho...
(In reply to comment #13) > I'm not sure this is a bug (but I'm not sure that it's not). I think it's a bug, but I wouldn't want to bet money that no one's relying on the "buggy" behavior... > I guess your idea is that we should zero-out the bytes from valid_len to pos > when doing a write if we discover that pos > valid_len? This is very oddly > side-effecty, imho... But it's basically what POSIX does... I guess we can also just say "don't do that then". Streams created with g_output_stream_new_resizable() won't have this problem anyway.
I'm going to commit as-is, though, with two changes: - explicitly document that the status quo is the desired behaviour - add a note about g_memory_output_stream_new_resizable() to the docs for g_memory_output_stream_new() instead of recommending passing (NULL, 0). (I didn't know this function existed before now).