After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 684842 - Seeks on GMemoryOutputStream don't have opaque semantics
Seeks on GMemoryOutputStream don't have opaque semantics
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gio
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2012-09-26 01:33 UTC by Maciej (Matthew) Piechotka
Modified: 2013-10-23 15:33 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Change semantics of seek on memory output stream (6.39 KB, patch)
2012-09-26 01:33 UTC, Maciej (Matthew) Piechotka
none Details | Review
Change semantics of seek on memory output stream (2.94 KB, patch)
2012-09-26 01:36 UTC, Maciej (Matthew) Piechotka
none Details | Review
GSeekable: document seek-past-end semantics (2.88 KB, patch)
2013-10-22 20:17 UTC, Allison Karlitskaya (desrt)
committed Details | Review
Change semantics of seek on memory output stream (3.16 KB, patch)
2013-10-22 20:17 UTC, Allison Karlitskaya (desrt)
committed Details | Review
GMemoryInputStream: improve seek tests (5.42 KB, patch)
2013-10-22 20:18 UTC, Allison Karlitskaya (desrt)
committed Details | Review
GMemoryOutputStream: docs and whitespace fixes (8.15 KB, patch)
2013-10-22 20:20 UTC, Allison Karlitskaya (desrt)
committed Details | Review

Description Maciej (Matthew) Piechotka 2012-09-26 01:33:35 UTC
Created attachment 225188 [details] [review]
Change semantics of seek on memory output stream

Problem:

Currently the seek have very strange behaviour - whether it succeeds or fails depend on size of INTERNAL buffer. Take the following code:

    try {
       GLib.MemoryOutputStream str = new MemoryOutputStream(new uint8[2], GLib.realloc, GLib.free);
       str.write(buf);
       str.seek(1, SeekType.CUR);
    } catch(GLib.Error error) {
       stderr.printf("ERROR\n");
    }

Question - will the code succeeds?

  - If buf.length < 2 it succeeds
  - If buf.length == 2 it fails (!!!)
  - If buf.length > 2 && buf.length < MIN_ARRAY_SIZE it succeeds
  - If buf.length == (MIN_ARRAY_SIZE << k) it fails (!!!) (I.e. 16, 32, 64, 128, ...)
  - If buf.length > 2 && buf.length != (MIN_ARRAY_SIZE << k) it succeeds

This makes seeking on memory output streams non-trivial as it does not have uniform behaviour. In similar way the seek from the end taks the current data length so to use it user needs to know the size. This is slightly better because the querying the data is easier while the rellocation cannot be done.

Possible solution:

Make the reallocable memory output streams mirror the behaviour of resizable files while non-reallocable have the old behaviour of fixed size files.

Notes:
 - It changes the behaviour of API. However given that API behaviour was not documented and it depended on gio internals (marked for improve anyway) I don't think this would be big problem.
 - I know 2.34 branch have not been created and 2.34 is frozen (posted for future branch)
Comment 1 Maciej (Matthew) Piechotka 2012-09-26 01:36:08 UTC
Created attachment 225189 [details] [review]
Change semantics of seek on memory output stream

Ups. I added -a when I changed the commit message.
Comment 2 Dan Winship 2012-09-26 13:08:31 UTC
related to bug 681374
Comment 3 Maciej (Matthew) Piechotka 2013-08-07 14:19:57 UTC
Ping. Any chance for review?
Comment 4 Allison Karlitskaya (desrt) 2013-10-22 16:28:07 UTC
I think this patch is a step in the right direction but it needs a bit more thought, attention to detail, test cases and documentation.

I think the correct thing to do here is to cleanly split our semantics for this stream into two separate cases: those where we have the realloc function and those where we do not.

In the realloc case I think that the semantics should exactly mirror those of dealing with a normal file in POSIX, where the seek pointer starts at zero.  There is a reasonable question to be asked about the 'initial state' of the file.  Do we ignore the size of the originally-given buffer and treat the file as empty except for the parts that we wrote or do we treat the original buffer as the original content of the file (ie: what happens if we seek to the end immediately after creating the stream?).  I think if we have a realloc function, the best thing to do is to ignore the original allocated size and act like "empty file".

In the non-realloc case, I think we should treat it as a block device of a set size.  In this case, seeking to the end would definitely mean that we seek to the end (which is in contrast to my reasoning above).
Comment 5 Allison Karlitskaya (desrt) 2013-10-22 18:13:15 UTC
Note: seek-past-end on block devices is an EINVAL.

I'm going to try to do this, assuming Dan agrees that this is a reasonable way forward.
Comment 6 Dan Winship 2013-10-22 18:31:52 UTC
(In reply to comment #4)
> I think this patch is a step in the right direction but it needs a bit more
> thought, attention to detail, test cases and documentation.

In particular, g_seekable_seek() needs to clarify what happens when you seek out of bounds, in both input and output streams, resizable and fixed, and then we need to make everything agree with that.

> There is a reasonable question to be asked about the 'initial state' of the
> file.  Do we ignore the size of the originally-given buffer and treat the file
> as empty except for the parts that we wrote or do we treat the original buffer
> as the original content of the file (ie: what happens if we seek to the end
> immediately after creating the stream?).  I think if we have a realloc
> function, the best thing to do is to ignore the original allocated size and act
> like "empty file".

I think the critical point here is that other than the initial buffer size, the caller does not control the size of the buffer; GMemoryOutputStream itself decides how much to expand it, and when. So I think that definitely says that the buffer size is just an implementation detail, not part of the data, and so seeking to the end should move to priv->valid_len, not priv->len.

> In the non-realloc case, I think we should treat it as a block device of a set
> size.  In this case, seeking to the end would definitely mean that we seek to
> the end (which is in contrast to my reasoning above).

Agreed.

So yes, I think this is the right way forward.
Comment 7 Allison Karlitskaya (desrt) 2013-10-22 20:17:50 UTC
Created attachment 257872 [details] [review]
GSeekable: document seek-past-end semantics

Introduce the concept of "fixed" vs. "resizable" streams and document
how g_seekable_seek() works for each case.

We don't include g_seekable_is_fixed_size() at this point because we
don't know if anyone would require it.  This may appear in the future if
someone asks for it, however.
Comment 8 Allison Karlitskaya (desrt) 2013-10-22 20:17:58 UTC
Created attachment 257873 [details] [review]
Change semantics of seek on memory output stream
Comment 9 Allison Karlitskaya (desrt) 2013-10-22 20:18:32 UTC
Created attachment 257874 [details] [review]
GMemoryInputStream: improve seek tests

Improve test coverage for testing seeking on fixed vs. resizable
GMemoryOutputStream.
Comment 10 Allison Karlitskaya (desrt) 2013-10-22 20:20:10 UTC
Created attachment 257875 [details] [review]
GMemoryOutputStream: docs and whitespace fixes

Document the difference between resizable and fixed-sized streams,
particularly with regards to sizing and seeking.
Comment 11 Dan Winship 2013-10-23 01:02:10 UTC
Comment on attachment 257873 [details] [review]
Change semantics of seek on memory output stream

>From: Maciej Piechotka <uzytkownik2@gmail.com>

did you want to --reset-author that?

>Subject: [PATCH] Change semantics of seek on memory output stream
>
>https://bugzilla.gnome.org/show_bug.cgi?id=684842

needs more commit message


One edge case bug: if you create the stream with an initial buffer that isn't zero-filled, and you seek past part of that buffer and then start writing again, the skipped bytes don't get zeroed out.
Comment 12 Dan Winship 2013-10-23 01:02:50 UTC
Comment on attachment 257874 [details] [review]
GMemoryInputStream: improve seek tests

basically good I think, except that you said "GMemoryInputStream" rather than "GMemoryOutputStream" in the commit summary
Comment 13 Allison Karlitskaya (desrt) 2013-10-23 04:35:28 UTC
(In reply to comment #11)
> One edge case bug: if you create the stream with an initial buffer that isn't
> zero-filled, and you seek past part of that buffer and then start writing
> again, the skipped bytes don't get zeroed out.


I'm not sure this is a bug (but I'm not sure that it's not).  As I mentioned on IRC earlier, _get_data_size() and _steal_as_bytes() will be working somewhat oddly with respect to fixed-size streams under our new understanding of them as block devices.

In POSIX if I had a block device that contained some junk before I started using it, then I did seek() and write(), I wouldn't expect the junk to be covered by zeros...

Because the definition of _get_data_size() and _steal_as_bytes() are so much tied up in terms of "up to the last byte written", I'm not sure what else we can do here... the obvious alternative is to memset() the entire buffer when we receive it, but that doesn't really seem right either.

I guess your idea is that we should zero-out the bytes from valid_len to pos when doing a write if we discover that pos > valid_len?  This is very oddly side-effecty, imho...
Comment 14 Dan Winship 2013-10-23 13:17:52 UTC
(In reply to comment #13)
> I'm not sure this is a bug (but I'm not sure that it's not).

I think it's a bug, but I wouldn't want to bet money that no one's relying on the "buggy" behavior...

> I guess your idea is that we should zero-out the bytes from valid_len to pos
> when doing a write if we discover that pos > valid_len?  This is very oddly
> side-effecty, imho...

But it's basically what POSIX does...

I guess we can also just say "don't do that then". Streams created with g_output_stream_new_resizable() won't have this problem anyway.
Comment 15 Allison Karlitskaya (desrt) 2013-10-23 13:49:38 UTC
I'm going to commit as-is, though, with two changes:

 - explicitly document that the status quo is the desired behaviour

 - add a note about g_memory_output_stream_new_resizable() to the docs for
   g_memory_output_stream_new() instead of recommending passing (NULL, 0).
   (I didn't know this function existed before now).