After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792780 - gbytes should reference toplevel bytes when slicing with g_bytes_new_from_bytes
gbytes should reference toplevel bytes when slicing with g_bytes_new_from_bytes
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2018-01-22 12:36 UTC by Christian Hergert
Modified: 2018-01-27 20:19 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
bytes: avoid intermediate refs in g_bytes_new_from_bytes() (1.91 KB, patch)
2018-01-22 23:36 UTC, Christian Hergert
none Details | Review
tests: additional test for g_bytes_new_from_bytes() (3.04 KB, patch)
2018-01-22 23:36 UTC, Christian Hergert
none Details | Review
bytes: avoid intermediate refs in g_bytes_new_from_bytes() (2.48 KB, patch)
2018-01-26 08:31 UTC, Christian Hergert
none Details | Review
tests: additional test for g_bytes_new_from_bytes() (3.98 KB, patch)
2018-01-26 08:31 UTC, Christian Hergert
none Details | Review
bytes: avoid intermediate refs in g_bytes_new_from_bytes() (2.59 KB, patch)
2018-01-27 02:47 UTC, Christian Hergert
committed Details | Review
tests: additional test for g_bytes_new_from_bytes() (4.28 KB, patch)
2018-01-27 02:47 UTC, Christian Hergert
committed Details | Review

Description Christian Hergert 2018-01-22 12:36:43 UTC
Currently, g_bytes_new_from_bytes() just references the bytes it was given. That means you can get in a situation where you can create lots of GBytes, each 1-byte into the parent gbytes.

For example, imagine a situation where you're trying to write all of a GBytes to a socket, and after each write attempt, you g_bytes_new_from_bytes(data, n_read, len-n_read). A carefully crafted peer could cause you to allocate N-1 gbytes slices.

If instead, we add a bit to the GBytes structure, to denote the structure is a GBytes ref, we could walk up to the top-most gbytes and reference that instead.

It would ensure we only need to hold on to 2 GBytes at any time.
Comment 1 Philip Withnall 2018-01-22 12:45:48 UTC
(In reply to Christian Hergert from comment #0)
> Currently, g_bytes_new_from_bytes() just references the bytes it was given.
> That means you can get in a situation where you can create lots of GBytes,
> each 1-byte into the parent gbytes.
> 
> For example, imagine a situation where you're trying to write all of a
> GBytes to a socket, and after each write attempt, you
> g_bytes_new_from_bytes(data, n_read, len-n_read). A carefully crafted peer
> could cause you to allocate N-1 gbytes slices.
> 
> If instead, we add a bit to the GBytes structure, to denote the structure is
> a GBytes ref, we could walk up to the top-most gbytes and reference that
> instead.

That would be O(N) in the number of iterations of your writer loop, no?

> It would ensure we only need to hold on to 2 GBytes at any time.

This seems more like the kind of thing which should be solved at the application level, by always doing your g_bytes_new_from_bytes() on the top-level GBytes in the first place.
Comment 2 Christian Hergert 2018-01-22 22:54:10 UTC
(In reply to Philip Withnall from comment #1)
> > If instead, we add a bit to the GBytes structure, to denote the structure is
> > a GBytes ref, we could walk up to the top-most gbytes and reference that
> > instead.
> 
> That would be O(N) in the number of iterations of your writer loop, no?

Correct

> > It would ensure we only need to hold on to 2 GBytes at any time.
> 
> This seems more like the kind of thing which should be solved at the
> application level, by always doing your g_bytes_new_from_bytes() on the
> top-level GBytes in the first place.

It's certainly something the application can work around, but if you have to keep a GBytes + offset what's the point of using GBytes? I might as well just keep a malloc'd buffer around and reference that.

The pattern that was common in Python (and as I understand it inspired such slice-able memory with GBytes) was something like:

    n_written = f.write(data)
    data = data[n_written:]

I wouldn't expect that to hold onto N slices, but always re-slice from the top-most buffer.

I also realize that we can do this without any extra bits. We can just check that struct _GBytes.free_func == g_bytes_unref.
Comment 3 Christian Hergert 2018-01-22 23:36:03 UTC
Created attachment 367258 [details] [review]
bytes: avoid intermediate refs in g_bytes_new_from_bytes()

When referencing a GBytes that is already a slice of another GBytes, we
can avoid referencing the intermediate GBytes and instead reference the
root bytes.

Doing so helps avoid keeping N GBytes instances alive when the
intermediates would have otherwise been finalized.
Comment 4 Christian Hergert 2018-01-22 23:36:07 UTC
Created attachment 367259 [details] [review]
tests: additional test for g_bytes_new_from_bytes()

This adds another test that ensures we are creating sub-slices properly
and that they reference the topmost bytes.

The structure for GBytes is copied into the test so that we can do field
integrety checking as part of the test.
Comment 5 Philip Withnall 2018-01-24 16:24:53 UTC
Review of attachment 367258 [details] [review]:

::: glib/gbytes.c
@@ +222,3 @@
+  /* Avoid an extra GBytes if all bytes were requested */
+  if (offset == 0 && length == bytes->size)
+    return g_bytes_ref (bytes);

A unit test for this path would be useful, if one doesn’t exist already.

@@ +228,3 @@
+  /* Avoid referencing intermediate GBytes */
+  while (bytes->free_func == (gpointer)g_bytes_unref)
+    bytes = bytes->user_data;

Actually, thinking about it, this should only ever do one iteration: when creating the first child bytes, its user_data will point to the root bytes. When creating the second child bytes from the first, its user_data will be copied from the first bytes’ (and hence will be the root bytes), etc.

Best leave it as a loop, but maybe add a comment explaining that it will typically be O(1) rather than O(N).

Also, it would be good to add something to the documentation for g_bytes_new_from_bytes() mentioning that this code pattern is supported and efficient.
Comment 6 Philip Withnall 2018-01-24 16:27:55 UTC
Review of attachment 367259 [details] [review]:

::: glib/tests/bytes.c
@@ +101,3 @@
 }
 
+static void

I’m trying to make sure that all new tests have a comment above the test function briefly summarising what the test does. Something like:

/* Test chained g_bytes_new_from_bytes() calls all use the root GBytes instance. */

@@ +106,3 @@
+  GBytes *bytes = g_bytes_new_static ("Some stupid data", strlen ("Some stupid data") + 1);
+  GBytes *bytes1 = g_bytes_new_from_bytes (bytes, 4, 13);
+  GBytes *bytes2 = g_bytes_new_from_bytes (bytes1, 0, 13);

I see bytes2 tests the code path I pointed out in my previous review to use the whole of the root GBytes. It would be better if this were a separate test case, since it’s basically a separate feature.
Comment 7 Philip Withnall 2018-01-24 16:28:45 UTC
(In reply to Christian Hergert from comment #2)
> It's certainly something the application can work around, but if you have to
> keep a GBytes + offset what's the point of using GBytes? I might as well
> just keep a malloc'd buffer around and reference that.
>
> …

OK, I’m convinced. Let’s do this!
Comment 8 Christian Hergert 2018-01-26 08:31:15 UTC
Created attachment 367457 [details] [review]
bytes: avoid intermediate refs in g_bytes_new_from_bytes()

When referencing a GBytes that is already a slice of another GBytes, we
can avoid referencing the intermediate GBytes and instead reference the
root bytes.

Doing so helps avoid keeping N GBytes instances alive when the
intermediates would have otherwise been finalized.
Comment 9 Christian Hergert 2018-01-26 08:31:20 UTC
Created attachment 367458 [details] [review]
tests: additional test for g_bytes_new_from_bytes()

This adds another test that ensures we are creating sub-slices properly
and that they reference the topmost bytes.

The structure for GBytes is copied into the test so that we can do field
integrety checking as part of the test.
Comment 10 Philip Withnall 2018-01-26 18:23:50 UTC
Review of attachment 367457 [details] [review]:

::: glib/gbytes.c
@@ +206,3 @@
+ * Since 2.56, if @offset is 0 and @length matches the size of @bytes, then
+ * @bytes will be returned with the reference count incremented by 1. If @bytes
+ * is a slice of a #GBytes, then the resulting #GBytes will reference that

s/slice of a/slice of another/, maybe?

@@ +207,3 @@
+ * @bytes will be returned with the reference count incremented by 1. If @bytes
+ * is a slice of a #GBytes, then the resulting #GBytes will reference that
+ * #GBytes instead of @bytes.

I was thinking that you could include an example of the type of use case you mentioned in comment #2, potentially including a code example if you can come up with one short enough.
Comment 11 Philip Withnall 2018-01-26 18:25:09 UTC
Review of attachment 367458 [details] [review]:

Great, thanks. The commit message needs a bit of tweaking since you split the two tests out, but that can be done before pushing.
Comment 12 Christian Hergert 2018-01-27 02:47:47 UTC
Created attachment 367499 [details] [review]
bytes: avoid intermediate refs in g_bytes_new_from_bytes()

When referencing a GBytes that is already a slice of another GBytes, we
can avoid referencing the intermediate GBytes and instead reference the
root bytes.

Doing so helps avoid keeping N GBytes instances alive when the
intermediates would have otherwise been finalized.
Comment 13 Christian Hergert 2018-01-27 02:47:52 UTC
Created attachment 367500 [details] [review]
tests: additional test for g_bytes_new_from_bytes()

This adds two new tests for g_bytes_new_from_bytes().

One test ensures that when creating a new GBytes that is a slice of
the entire base bytes, we just return the base bytes with it's reference
count incremented by one.

The other test ensures that when performing sub-slices of GBytes, for
which the parent GBytes also references a GBytes, that we skip the
intermediate GBytes and reference the base GBytes. Additional testing
of the internal state of the GBytes structure is performed to prove
the correctness of the implementation.
Comment 14 Christian Hergert 2018-01-27 02:48:55 UTC
I failed to create a succinct example, since it's mostly useful in async use cases. But I did add a blurb about that usage being simplified.
Comment 15 Philip Withnall 2018-01-27 12:26:23 UTC
Review of attachment 367499 [details] [review]:

Let’s go with this. Thanks for the tweaks. :-)
Comment 16 Philip Withnall 2018-01-27 12:26:50 UTC
Review of attachment 367500 [details] [review]:

++
Comment 17 Christian Hergert 2018-01-27 20:18:58 UTC
Thanks for the reviews!

Attachment 367499 [details] pushed as 4151bce - bytes: avoid intermediate refs in g_bytes_new_from_bytes()
Attachment 367500 [details] pushed as 47b78e6 - tests: additional test for g_bytes_new_from_bytes()