After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 663291 - GBytes: Immutable, refcounted sequence of bytes
GBytes: Immutable, refcounted sequence of bytes
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2011-11-03 06:55 UTC by Stef Walter
Modified: 2011-11-24 08:37 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Add g_byte_array_hash() and g_byte_array_equal() functions (5.30 KB, patch)
2011-11-03 06:55 UTC, Stef Walter
none Details | Review
After discussion with Bejamin and Ryan on IRC, this is what we came up with. (49.04 KB, patch)
2011-11-11 16:15 UTC, Stef Walter
reviewed Details | Review
Changes after review by Matthias (57.01 KB, patch)
2011-11-11 20:30 UTC, Stef Walter
reviewed Details | Review
Updated for Matthias' second review. (57.04 KB, patch)
2011-11-12 05:57 UTC, Stef Walter
none Details | Review
Updated with some changes, matching my comments. (58.26 KB, patch)
2011-11-14 16:57 UTC, Stef Walter
none Details | Review
Updated for IRC discussion (61.10 KB, patch)
2011-11-15 08:20 UTC, Stef Walter
none Details | Review
Updated patch, fix g_bytes_unref() (61.22 KB, patch)
2011-11-21 09:10 UTC, Stef Walter
none Details | Review
Updated after recent discussion about stealing GBytes memory. Important (67.68 KB, patch)
2011-11-22 11:10 UTC, Stef Walter
none Details | Review
Removed try_steal from public API, and made other minor changes outlined. (58.58 KB, patch)
2011-11-22 16:35 UTC, Stef Walter
none Details | Review
Now rebased on master. (58.59 KB, patch)
2011-11-22 16:56 UTC, Stef Walter
reviewed Details | Review
Updated for Matthias' review. (58.59 KB, patch)
2011-11-23 07:26 UTC, Stef Walter
accepted-commit_now Details | Review

Description Stef Walter 2011-11-03 06:55:39 UTC
This allows GByteArray to be used as a key in a hash table.
Comment 1 Stef Walter 2011-11-03 06:55:42 UTC
Created attachment 200570 [details] [review]
Add g_byte_array_hash() and g_byte_array_equal() functions

 * This allows GByteArray to be used as a key in a hash table
Comment 3 Allison Karlitskaya (desrt) 2011-11-03 12:58:48 UTC
I'm vaguely uncomfortable with using mutable types for a hashtable.

Doubly so when the main difference between it and a similar type is that it is easier to modify than the other one.

I suppose you're more focused on the ability to contain nuls, however.
Comment 4 Stef Walter 2011-11-03 15:15:05 UTC
(In reply to comment #3)
> I'm vaguely uncomfortable with using mutable types for a hashtable.

I was just thinking today about how it would have been helpful to have more immutable types in glib. Obviously GHashtable already has a bunch of mutable types usable as keys, at least:

 * g_int_hash
 * g_int64_hash
 * g_str_hash
 * g_string_hash

> Doubly so when the main difference between it and a similar type is that it is
> easier to modify than the other one.
> 
> I suppose you're more focused on the ability to contain nuls, however.

Yeah, trying to find a way to have an arbitrary set of bytes to be a key in a hash table. I guess GString can handle embedded nulls too, but GString doesn't do ref/unref :(

But if this isn't right to go into glib, it's not that much code to copy and paste around. Your call.
Comment 5 Dan Winship 2011-11-03 15:34:19 UTC
I'm sure I've written GByteArray hash/equal functions at least once before too. Given that we don't have an immutable bytestring type, GByteArray tends to get used for that.

(In reply to comment #4)
> I guess GString can handle embedded nulls too, but GString doesn't
> do ref/unref :(

and GString is also mutable anyway
Comment 6 Allison Karlitskaya (desrt) 2011-11-03 16:35:57 UTC
so GBuffer is a private API in GLib that is:

  a pointer to some data
  with a length
  refcounting
  immutability

It has a somewhat different target audience than your use case, but maybe we shouldn't really feel too constrained by that?
Comment 7 Dan Winship 2011-11-03 19:06:36 UTC
I've written one of those before too :) (SoupBuffer). And I know GStreamer has a refcounted immutable buffer type too. So GBuffer would probably be useful as a public type.
Comment 8 Stef Walter 2011-11-04 05:57:24 UTC
GBuffer seems like it would fit nicely.

The name seems strange to me for an immutable type, but maybe it's because the "buffer" types I've implemented recently have been for "buffering".
Comment 9 Stef Walter 2011-11-11 16:15:51 UTC
Created attachment 201240 [details] [review]
After discussion with Bejamin and Ryan on IRC, this is what we came up with.

Note that we've renamed it from 'buffer' to 'bytes'. Some people think of a
buffer a writable place for holding incoming/outgoing data. The 'bytes' name
is clearer, and is from python3.

GBytes: A new type for an immutable set of bytes.

 * Represents an immutable reference counted block of memory.
 * This is basically the internal glib GBuffer structure exposed,
   renamed, and with some additional capabilities.
 * The GBytes name comes from python3's immutable 'bytes' type
 * GBytes can be safely used as keys in hash tables, and have
   functions for doing so: g_bytes_hash, g_bytes_equal
 * GByteArray is a mutable form of GBytes, and vice versa. There
   are functions for converting from one to the other efficiently:
   g_bytes_unref_to_array() and g_byte_array_free_to_bytes()
 * Adds g_byte_array_new_take_data() to support above functions
Comment 10 Matthias Clasen 2011-11-11 16:57:08 UTC
Review of attachment 201240 [details] [review]:

::: glib/garray.c
@@ +1353,2 @@
  *
+ * #GBytes is an immutable sequence of bytes from an unspecified origin.

This is meant as a 'see also', I guess ? I seems a little unmotivated to add this paragraph in the middle of the GByteArray docs. I'd probably add it at the very bottom of the long description, and reword it a bit, like:

'See #GBytes if you are interested in an immutable object representing a sequence of bytes.'

@@ +1411,3 @@
+ *
+ * Create byte array containing the data. The data will be owned by the
+ * array and must have been allocated using the standard glib memory

Please use the official capitalization when referring to glib in the docs: GLib

Also, it might be more direct to say that the data will be freed by g_free(), than trying to enumerate all sources of such memory. Something like:

The data will be owned by the array and will be freed with g_free(), i.e. it could
be allocated using g_strdup().

@@ +1420,3 @@
+GByteArray *
+g_byte_array_new_take_data (guint8 *data,
+                            gsize   len)

This seems unrelated to GBytes.
Should be a separate commit before the GBytes addition, I'd say.

::: glib/gbufferprivate.h
@@ +23,3 @@
 
+#ifndef __G_BYTES_H__
+#define __G_BYTES_H__

The file should be renamed to gbytes.h too, right ?

@@ +43,3 @@
  * different procedures for freeing the memory region.  Examples are
  * memory from g_malloc(), from memory slices, from a #GMappedFile or
  * memory from other allocators.

Would be good to refer to the g_byte_array_free_to_bytes() function here, or discuss back-and-forth conversion GByteArray <> GBytes in general.

You should also mention the main purpose of this whole exercise: namely, that GBytes are suitable as keys in hash tables, using g_bytes_equal() and g_bytes_hash().

Finally, you probably should add a boxed type for GBytes in gobject/gvaluetypes.h, using the ref/unref functions.

::: glib/gbytesprivate.h
@@ +48,3 @@
+typedef struct
+{
+  GBytes publi;

'publi' sounds very unlike any glib naming convention. I'd propose either 'public_' or 'bytes'
Comment 11 Stef Walter 2011-11-11 20:30:36 UTC
Created attachment 201254 [details] [review]
Changes after review by Matthias
Comment 12 Stef Walter 2011-11-11 20:31:22 UTC
Thanks for looking it over so promptly :)

(In reply to comment #10)
> + * #GBytes is an immutable sequence of bytes from an unspecified origin.
> 
> This is meant as a 'see also', I guess ? I seems a little unmotivated to add
> this paragraph in the middle of the GByteArray docs. I'd probably add it at the
> very bottom of the long description, and reword it a bit, like:
> 
> 'See #GBytes if you are interested in an immutable object representing a
> sequence of bytes.'

Done.

> @@ +1411,3 @@
> + *
> + * Create byte array containing the data. The data will be owned by the
> + * array and must have been allocated using the standard glib memory
> 
> Please use the official capitalization when referring to glib in the docs: GLib
> 
> Also, it might be more direct to say that the data will be freed by g_free(),
> than trying to enumerate all sources of such memory. Something like:
> 
> The data will be owned by the array and will be freed with g_free(), i.e. it
> could
> be allocated using g_strdup().

Done.

> @@ +1420,3 @@
> +GByteArray *
> +g_byte_array_new_take_data (guint8 *data,
> +                            gsize   len)
> 
> This seems unrelated to GBytes.
> Should be a separate commit before the GBytes addition, I'd say.

It seems related. Without it g_bytes_unref_to_array() wouldn't work.

> ::: glib/gbufferprivate.h
> @@ +23,3 @@
> 
> +#ifndef __G_BYTES_H__
> +#define __G_BYTES_H__
> 
> The file should be renamed to gbytes.h too, right ?

Hmmm, in the patch gbufferprivate.h no longer exists, or am i missing something. FWIW, we do still have a gbytesprivate.h though so that GMappedFile can used the GRealBytes structure.

> @@ +43,3 @@
>   * different procedures for freeing the memory region.  Examples are
>   * memory from g_malloc(), from memory slices, from a #GMappedFile or
>   * memory from other allocators.
> 
> Would be good to refer to the g_byte_array_free_to_bytes() function here, or
> discuss back-and-forth conversion GByteArray <> GBytes in general.

Done.

> You should also mention the main purpose of this whole exercise: namely, that
> GBytes are suitable as keys in hash tables, using g_bytes_equal() and
> g_bytes_hash().

Done. In addition I added g_bytes_compare() so that it can be used with GTree.

BTW, that's one purpose of GBytes, but not the only one. GVariant uses it extensively to share immutable memory, and I'm using it in libgcr to back various ASN.1 parsed data, without making tons of copies.

> Finally, you probably should add a boxed type for GBytes in
> gobject/gvaluetypes.h, using the ref/unref functions.

Good point. Done.

> ::: glib/gbytesprivate.h
> @@ +48,3 @@
> +typedef struct
> +{
> +  GBytes publi;
> 
> 'publi' sounds very unlike any glib naming convention. I'd propose either
> 'public_' or 'bytes'

Changed to 'bytes'.
Comment 13 Matthias Clasen 2011-11-12 02:31:36 UTC
Review of attachment 201254 [details] [review]:

::: glib/gbytesprivate.h
@@ +50,3 @@
+  GBytes bytes;
+
+  /*< protected >*/

'protected' is not a gobject concept. Since the entire header is private anyway, the annotations are probably not adding much anyway.

::: glib/gmappedfile.c
@@ +91,1 @@
 /* Make sure the layout of GMappedFile is the same as GBuffer's */

s/GBuffer/GBytes/ there
Comment 14 Stef Walter 2011-11-12 05:57:38 UTC
Created attachment 201278 [details] [review]
Updated for Matthias' second review.
Comment 15 Matthias Clasen 2011-11-12 16:23:49 UTC
Maybe we should have GString <> GBytes conversions too ? 
There have been requests for static, refcounted strings before:
bug 622721
bug 612591
Comment 16 Benjamin Otte (Company) 2011-11-12 17:32:55 UTC
- Why is the structure not completely opaque?
There is no reason not to have get_data() and get_size() functions. In particular for cases where you can omit mapping the memory until you eventually need it. I could for example have an image buffer backed by a JPEG image that only got decoded on demand. Or a buffer referencing memory on the GPU that only got mapped into real memory once it's needed.
I suppose mitch and pippin could come up with a bunch of useful cases for the gimp, too.
(I'm not saying that the current API can do that, but if we ever think about going there, the public struct members make it impossible.

- I'm not a fan of the naming of the functions, in particular the constructors
1) the default constructor should be obvious. I don't think it's clear which of them is the one I want to most likely use. So unless we expect every function to be used equally often, we should have clear naming. IMO the take variant is the most common version, but I don't know.
2) pointers are pointers and not memory areas. So that name is completely unintuitive.
3) I think we never use take in constructors, so it feels kinda unnatural.
I would come up with:
_new()
_new_copy()
_new_static()
_new_with_free_func()
Unless someone now can't map these names to the old names immediately, I'd be convinced they are more in line with glib naming, give better results and bindings and they fulfill my 3 arguments.

- I'm not convinced about the immutability concept
The problem with immutability is that I imagine creating the buffers gets rather complex. You want to get a buffer from somewhere, then modify it, then pass it on. Say a checksum that is xor'ed with some specific number for one of the weird file formats. You'd use g_checksum_unref_to_bytes(), then to_byte_array() that, then modify the bytes, then to_bytes() the byte array?
If the bytes were mutable in the first place - say via a g_bytes_is_mutable() function - that'd be way easier. You'd just xor the bytes.
OTOH it allows a bunch of convenience. So I don't know.

- We have no way to make bytes mutable in other ways
Say I have my own allocated bytes (say an XShmBuffer). I now cannot make them writable, because GBuffer doesn't provide me with a way to check if any GBuffer is an shm buffer or if I first need to create a new shm region.
Maybe a get_free_func() would help there, but not sure.

- some things are missing
Not sure how important each of those are, but I consider a bunch of them quite nice to have (Swfdec has those, see http://cgit.freedesktop.org/swfdec/swfdec/tree/swfdec/swfdec_buffer.h - 
- g_byte_new_subbytes(original, offset, size) (sucky name)
This would do what it says: ref the original bytes and set pointer and size based on offset and size. It is rather important for parsing files or other data into chunks.
- g_bytes_new_from_file() and g_bytes_save_to_file()
Maybe this should be g_file_load_bytes(), but the ability to quickly load and save a file (ideally from a gdb prompt) - and knowing how easy it is - often is the difference between debugging in 5 seconds and next week.

And, my most important issue with this:

- There is no user
Before we commit to it, I'd like to have some other projects (in addition to GVariant) ported to it to be sure that we got it right. I'm volunteering to port Swfdec, but I'd like to see someone tackling libsoup.
Both to be sure that immutability doesn't suck, that we got all the APIs we need to make it useful, and that the APIs behave in the way you would expect them to.
Comment 17 Emmanuele Bassi (:ebassi) 2011-11-12 17:52:06 UTC
Cogl has a Buffer Object API as well:

  http://git.gnome.org/browse/cogl/tree/cogl/cogl-buffer.h

this API maps to the GL buffer objects as defined by the GL_ARB_vertex_buffer_object and GL_ARB_pixel_buffer_object extensions for highly efficient buffer handling:

  http://www.opengl.org/registry/specs/ARB/vertex_buffer_object.txt
  http://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt

the first extension, especially, should be a required read for everyone attempting to write a mutable/immutable memory buffer - one of the few occasions where the GL API is actually correct and sane.

we use CoglBuffer for most of the internals of our API, to describe vertex data like position, texture coordinates, and color.
Comment 18 Matthias Clasen 2011-11-12 18:48:09 UTC
(In reply to comment #16)

> 
> - There is no user
> Before we commit to it, I'd like to have some other projects (in addition to
> GVariant) ported to it to be sure that we got it right. I'm volunteering to
> port Swfdec, but I'd like to see someone tackling libsoup.
> Both to be sure that immutability doesn't suck, that we got all the APIs we
> need to make it useful, and that the APIs behave in the way you would expect
> them to.

To point out the obvious here:
Stef is doing this work because he needs this api (in gcr, I believe), so there is one out-of-glib user for you. Naturally, his use case is the hash-key one, with an emphasis on immutability, and doesn't cover all the other wild ideas that you have been throwing in here.
Comment 19 Dan Winship 2011-11-12 19:08:17 UTC
(In reply to comment #16)
> 3) I think we never use take in constructors, so it feels kinda unnatural.

There's g_simple_async_result_new_take_error(). At any rate, it's perfectly obvious what it means...

> You'd use g_checksum_unref_to_bytes(), then
> to_byte_array() that, then modify the bytes, then to_bytes() the byte array?

Not necessarily. "Immutable" here is mostly about guaranteeing that other people won't see changes once you've given them a pointer to it (eg, the hash table key case that started this whole thing), and more specifically, guaranteeing that the base pointer and size will never change (unlike with GByteArray). If no one else has a pointer to the memory yet, then you can safely cheat.

> - g_byte_new_subbytes(original, offset, size) (sucky name)

Yes, SoupBuffer has that too.

> Before we commit to it, I'd like to have some other projects (in addition to
> GVariant) ported to it to be sure that we got it right. I'm volunteering to
> port Swfdec, but I'd like to see someone tackling libsoup.
> Both to be sure that immutability doesn't suck, that we got all the APIs we
> need to make it useful, and that the APIs behave in the way you would expect
> them to.

SoupBuffer is part of libsoup's public API, so it can't really be ported to GBytes. But looking at gbytes.h, it's very very similar (assuming that "immutability" means what I described above).

The one thing SoupBuffer has that this doesn't is SOUP_MEMORY_TEMPORARY, which is a slightly weird optimization that avoids copies that would otherwise be needed in some places. (SoupBuffer doesn't have ref and unref functions, it has copy and free. Normally though, when you "copy" a SoupBuffer, it's just reffing it. However, if you copy a buffer that uses TEMPORARY memory, then it actually does make a copy. So as long as no one else decides to keep a copy of the buffer to look at later, you avoid ever copying the data.) But anyway, I don't think I'd suggest that that belonged in GBytes too.
Comment 20 Stef Walter 2011-11-14 16:30:05 UTC
(In reply to comment #15)
> Maybe we should have GString <> GBytes conversions too ? 
> There have been requests for static, refcounted strings before:
> bug 622721
> bug 612591

It's true that GString can be used for arbitrary bytes, since it supports non-null terminated data. So in theory we could add brain dead GString -> GBytes conversion function. 

But I think the bugs there make good arguments for having their string specific immutable, ref counted string type. Besides obvious complications like trying to make sure the byte data is null terminated, it's just not clear at this point that GBytes would be the best fit for these ref counted strings.
Comment 21 Stef Walter 2011-11-14 16:34:05 UTC
(In reply to comment #16)
> - Why is the structure not completely opaque?
> There is no reason not to have get_data() and get_size() functions. In
> particular for cases where you can omit mapping the memory until you eventually
> need it. I could for example have an image buffer backed by a JPEG image that
> only got decoded on demand. Or a buffer referencing memory on the GPU that only
> got mapped into real memory once it's needed.

The big benefit of having an immutable data structure is that multiple threads can use it really well. Sort of like how ref counting is something that fits well with threading, immutability is another tool that allows multiple threads and callers to operate on the same data without fear of one stomping on the other. If they need to make changes then they make their own copies.

So with that in mind, having this sort of decode-on-demand and other non deterministic behavior raises a host of questions about what we're actually trying to accomplish here. All of a sudden race conditions raise their head, locking is required when two threads call get_data() at the same time, dead locks become a possibility, what happens on error? It becomes rather non-deterministic.

I think that GLoadableIcon type interface fits well with the loading a JPEG image on demand, but I understand you're thinking generally here.

> (I'm not saying that the current API can do that, but if we ever think about
> going there, the public struct members make it impossible.

If 'load this data on demand' is a common general metaphor, then there probably be an interface specifically for that, rather than cramming it into GBytes.

> - I'm not a fan of the naming of the functions, in particular the constructors
> 1) the default constructor should be obvious. I don't think it's clear which of
> them is the one I want to most likely use. So unless we expect every function
> to be used equally often, we should have clear naming. IMO the take variant is
> the most common version, but I don't know.
> 2) pointers are pointers and not memory areas. So that name is completely
> unintuitive.
> 3) I think we never use take in constructors, so it feels kinda unnatural.
> I would come up with:
> _new()
> _new_copy()
> _new_static()
> _new_with_free_func()

Good points. The original names came from Ryan's internal GBuffer implementation, and I agree that they need to be cleaned up for public consumption. I've renamed the functions to:

g_bytes_new()
 - copies the data, this is the common glib mode of operations for
   'default' constructor
g_bytes_new_take()
 - takes the data and assumes it can be freed with g_free(). 'take' is common
   glib nomenclature, for assuming ownership of something, and it seems to make
   just as much sense in a constructor as elsewhere.
g_bytes_new_static()
g_bytes_new_with_free_func()
 - as suggested by you

> - I'm not convinced about the immutability concept
> The problem with immutability is that I imagine creating the buffers gets
> rather complex. You want to get a buffer from somewhere, then modify it, then
> pass it on. Say a checksum that is xor'ed with some specific number for one of
> the weird file formats. You'd use g_checksum_unref_to_bytes(), then
> to_byte_array() that, then modify the bytes, then to_bytes() the byte array?
> If the bytes were mutable in the first place - say via a g_bytes_is_mutable()
> function - that'd be way easier. You'd just xor the bytes.
> OTOH it allows a bunch of convenience. So I don't know.

It would be simple to implement g_bytes_is_mutable(), however it would be inherently racy when it came to multi-threading. 

If you want to pass around mutable buffers, then using GByteArray is a good fit. On the other hand if you want to pass around generally immutable bytes and here and there change something before passing it on, then the g_bytes_unref_to_array() function as implemented is perfect for that:

 * If you're the only one holding a reference to the bytes, and want to change   it, then you use g_bytes_unref_to_array() and make changes. You get the bonus of  efficiency, no memory duplication.
 * If someone else (eg: another thread, mapped file) is holding onto the GBytes, or they were allocated in an immutable way, then you get a copy from g_bytes_unref_to_array(). We automatically do the right thing in a completely thread-safe manner, and the someone else continues to the unchanged bytes as expected.

In my view there are several valid ways to return (and pass around) byte data, each serving its own purpose:

 1. Some APIs (think g_input_stream_read(), or g_checksum_get_digest()) return
    data into an already existing buffer. This shouldn't be (and won't) be
    subsumed by GBytes. 

 2. Some APIs it makes sense to return byte data that's easily mutable. These
    can return GByteArray (or if necessary a pointer + length out val). 

 3. And in some APIs it makes sense to return an thread-safe immutable set of
    bytes, and those should return GBytes.

I realize it would be nice to have "one true way" to return to bytes or strings, but as other programming frameworks have come to grips with, this just isn't useful in reality:

Java: String (immutable), StringBuilder (mutable)
python: bytes (immutable), bytearray (mutable)
and many more...

If this were C++ (ha ha) then we'd have a GByteArray constructor that takes GBytes, and GByteArray constructor that takes GBytes, which make the various forms transparent. However since we're in C, we get verbosity for better or worse.

> - We have no way to make bytes mutable in other ways
> Say I have my own allocated bytes (say an XShmBuffer). I now cannot make them
> writable, because GBuffer doesn't provide me with a way to check if any GBuffer
> is an shm buffer or if I first need to create a new shm region.
> Maybe a get_free_func() would help there, but not sure.

Or we could make something like g_bytes_unref_to_data_with_free_func() which returns the data if you're the last caller, and you passed in the right GDestroyNotify (ie: you know where it came from). It would return NULL otherwise. This would be a non-racy equivalent to what you're suggesting ... at least as I understand it.

> - some things are missing
> Not sure how important each of those are, but I consider a bunch of them quite
> nice to have (Swfdec has those, see
> http://cgit.freedesktop.org/swfdec/swfdec/tree/swfdec/swfdec_buffer.h - 
> - g_byte_new_subbytes(original, offset, size) (sucky name)

Done. I used that name, did you have ideas for a better one?

> This would do what it says: ref the original bytes and set pointer and size
> based on offset and size. It is rather important for parsing files or other
> data into chunks.
> - g_bytes_new_from_file() and g_bytes_save_to_file()
> Maybe this should be g_file_load_bytes(), but the ability to quickly load and
> save a file (ideally from a gdb prompt) - and knowing how easy it is - often is
> the difference between debugging in 5 seconds and next week.

Good point. However names with g_file_* should probably be related to GFile from now on, no? I know there are some earlier ones that were named like that, but we probably don't need to perpetuate the confusion.

Should g_bytes_new_from_file() try to map the file first, and then fall back to reading it?

> And, my most important issue with this:
> 
> - There is no user
> Before we commit to it, I'd like to have some other projects (in addition to
> GVariant) ported to it to be sure that we got it right. I'm volunteering to
> port Swfdec, but I'd like to see someone tackling libsoup.
> Both to be sure that immutability doesn't suck, that we got all the APIs we
> need to make it useful, and that the APIs behave in the way you would expect
> them to.

I agree. I've brought GBytes into gcr and am porting to it. Not just for use in GHashtables but for stuff like passing around certificate data, parsing, extracting data from DER structures, PK key values, etc...
Comment 22 Stef Walter 2011-11-14 16:34:50 UTC
(In reply to comment #19)
> > 3) I think we never use take in constructors, so it feels kinda unnatural.
> 
> There's g_simple_async_result_new_take_error(). At any rate, it's perfectly
> obvious what it means...

Agreed.

> > You'd use g_checksum_unref_to_bytes(), then
> > to_byte_array() that, then modify the bytes, then to_bytes() the byte array?
> 
> Not necessarily. "Immutable" here is mostly about guaranteeing that other
> people won't see changes once you've given them a pointer to it (eg, the hash
> table key case that started this whole thing), and more specifically,
> guaranteeing that the base pointer and size will never change (unlike with
> GByteArray). If no one else has a pointer to the memory yet, then you can
> safely cheat.

Exactly.

> The one thing SoupBuffer has that this doesn't is SOUP_MEMORY_TEMPORARY, which
> is a slightly weird optimization that avoids copies that would otherwise be
> needed in some places. (SoupBuffer doesn't have ref and unref functions, it has
> copy and free. Normally though, when you "copy" a SoupBuffer, it's just reffing
> it. However, if you copy a buffer that uses TEMPORARY memory, then it actually
> does make a copy. So as long as no one else decides to keep a copy of the
> buffer to look at later, you avoid ever copying the data.) But anyway, I don't
> think I'd suggest that that belonged in GBytes too.

So essentially this is to do with scoping? Where the original SoupBuffer is tightly scoped and its data cannot be kept in memory until all consumers of that data are done with it. Am I understanding that right? 

I see that in libsoup this is used together with alloca memory. What was the reasoning behind that? Speed? Memory fragmentation? Just interested.
Comment 23 Stef Walter 2011-11-14 16:57:24 UTC
Created attachment 201385 [details] [review]
Updated with some changes, matching my comments.

Ongoing discussion on IRC, will post another updated
patch soon.
Comment 24 Benjamin Otte (Company) 2011-11-14 17:23:26 UTC
(In reply to comment #21)
>
> [..]
>
In general, I agree on all you said. But we argued on IRC about this for a bit, and some important things came out of it:

1) We should make get_data() and get_size() functions out of principle, so that the GBytes struct is fully opaque.

2) We argued about the names for g_bytes_unref_to_data_with_free_func() and g_bytes_new_subbuffer() and came up with g_bytes_try_steal_and_unref() and g_bytes_new_from_bytes(), which I think sound both better and more glib'y.

3) g_bytes_new_*() functions should potentially guarantee what free func they use and list that in the documentation, so that other people implementing their own "make writable" function can use g_bytes_try_steal_and_unref() with the common types of buffers.
Comment 25 Dan Winship 2011-11-14 21:38:30 UTC
(In reply to comment #22)
> > The one thing SoupBuffer has that this doesn't is SOUP_MEMORY_TEMPORARY, which...

> So essentially this is to do with scoping? Where the original SoupBuffer is
> tightly scoped and its data cannot be kept in memory until all consumers of
> that data are done with it. Am I understanding that right? 

It's really more about efficiency. eg, in soup-message-io, it lets us reuse the same buffer for every read, rather than having to malloc a new buffer each time and then having the SoupBuffer take it. Which, admittedly, is not a huge difference. I think maybe there was more going on there when I originally wrote the code...
Comment 26 Stef Walter 2011-11-15 08:20:24 UTC
Created attachment 201424 [details] [review]
Updated for IRC discussion

* Added g_bytes_try_steal_and_unref(), which is pretty wild.
 * Rearrange structure, and privatize, add accessor methods
 * Naming cleanup
Comment 27 Benjamin Otte (Company) 2011-11-15 10:47:41 UTC
Some (In reply to comment #26)
> Created an attachment (id=201424) [details] [review]
> Updated for IRC discussion
>
The only thing I'm not sure conceptually about in this patch is g_bytes_unref_to_array(). As it only uses public GBytes API, we could rename it to g_byte_array_new_take_bytes() now and avoid the weird name. But no strong opinion on that.

And here's some code review:
* You forgot to make g_bytes_unref() do a g_slice_free()
* g_bytes_new_from_bytes() should either take a gsize as the offset parameter (my preference) or g_return_if_fail (offset < 0)
* Allowing NULL as a magic value to the free_func in g_bytes_try_steal_and_unref() is problematic as you can't check for a NULL destroy notify now. Why not just require people use g_free?
* The docs don't clearly list the free funcs for people to use with g_bytes_try_steal_and_unref().
* g_bytes_compare()'s compare function does not have an actual sorting purpose, right? Just asking because if I had written it, I'd have compared the size first. :)
* g_bytes_unref() should probably take a GBytes * instead of a gpointer to conform to glib behavior?
Comment 28 Stef Walter 2011-11-21 09:07:17 UTC
So, during the past week I've migrated much of the gcr DER stuff and other parsing, file building internals to GBytes. You can see the patch here:

http://cgit.collabora.com/git/user/stefw/gcr.git/?h=bytes

In general this was a good change, although added some complexity. The complexity however is because the result is more correct. Previously there were a lot of assumptions about how long a buffer must exist in order to support parsed data, and this was hard to guarantee.

I did this migration internally, and not did not (yet) expose any public API with GBytes. So I did have to transfer to and from GBytes at public API boundaries.

Some notes:

 * The fact that g_bytes_unref() doesn't allow a NULL is really irritating.

 * I found myself doing this in order to get data to return from an API that
   needed to return a pointer + length:

   g_byte_array_free (g_bytes_unref_to_array (bytes), FALSE);

   Maybe it would make sense to have g_bytes_unref_to_data()?

 * I saw several locations where it would have been nice to have the SoupBuffer
   'temporary' stuff that Dan talks about above. All the cases where I saw this
   however it was due to trying to take const data passed into a public API,
   and placing it into a GBytes internally.

(In reply to comment #27)
> Some (In reply to comment #26)
> > Created an attachment (id=201424) [details] [review] [details] [review]
> > Updated for IRC discussion
> >
> The only thing I'm not sure conceptually about in this patch is
> g_bytes_unref_to_array(). As it only uses public GBytes API, we could rename it
> to g_byte_array_new_take_bytes() now and avoid the weird name. But no strong
> opinion on that.

I guess we could. However the new name is unclear, it doesn't necessarily 'take' the byte data, although it does 'take' a reference to the GBytes.

> And here's some code review:
> * You forgot to make g_bytes_unref() do a g_slice_free()

Oops. Done.

> * g_bytes_new_from_bytes() should either take a gsize as the offset parameter
> (my preference) or g_return_if_fail (offset < 0)

Changed to gsize.

> * Allowing NULL as a magic value to the free_func in
> g_bytes_try_steal_and_unref() is problematic as you can't check for a NULL
> destroy notify now. Why not just require people use g_free?

That's what I thought at first, but it turns out that g_free is a different memory location when referred to from within the libglib.so library, and from an application linking to it. If I recall correctly, one ends up as "g_free@plt" or something like that.

> * g_bytes_compare()'s compare function does not have an actual sorting purpose,
> right? Just asking because if I had written it, I'd have compared the size
> first. :)

I assumed that it could actually be used for sorting.

> * g_bytes_unref() should probably take a GBytes * instead of a gpointer to
> conform to glib behavior?

Then it always needs to be cast to a GDestroyNotify when used that way. In my migration, I used tons of g_bytes_new_with_free_func with g_bytes_unref as a free_func. But I can change this if necessary.
Comment 29 Stef Walter 2011-11-21 09:10:33 UTC
Created attachment 201792 [details] [review]
Updated patch, fix g_bytes_unref()
Comment 30 Dan Winship 2011-11-21 13:18:45 UTC
(In reply to comment #28)
>  * The fact that g_bytes_unref() doesn't allow a NULL is really irritating.

yeah, see bug 661199 (though one of the objections there was about *changing* the semantics of g_object_unref(), so having g_bytes_unref() start out allowing NULL might be more palatable).

>  * I saw several locations where it would have been nice to have the SoupBuffer
>    'temporary' stuff that Dan talks about above.

can you point to some examples?

> > * g_bytes_unref() should probably take a GBytes * instead of a gpointer to
> > conform to glib behavior?
> 
> Then it always needs to be cast to a GDestroyNotify when used that way. In my
> migration, I used tons of g_bytes_new_with_free_func with g_bytes_unref as a
> free_func.

But all of those places should be using g_bytes_new_from_bytes() anyway.
Comment 31 Stef Walter 2011-11-22 11:10:52 UTC
Created attachment 201917 [details] [review]
Updated after recent discussion about stealing GBytes memory. Important

to note that this is corner case functionality.

GBytesMemoryType describes the way that a GBytes was created, and then
can be passed to g_bytes_unref_steal_unref() to steal memory.

Renamed g_bytes_new_with_free_func() to g_bytes_new_custom(). For GBytes
created with g_bytes_new_custom() use g_bytes_unref_steal_unref_custom()
to steal memory.

Fix argument type for g_bytes_unref(). Add g_bytes_unref_to_data().
Comment 32 Stef Walter 2011-11-22 11:28:53 UTC
(In reply to comment #30)
> (In reply to comment #28)
> >  * The fact that g_bytes_unref() doesn't allow a NULL is really irritating.
> 
> yeah, see bug 661199 (though one of the objections there was about *changing*
> the semantics of g_object_unref(), so having g_bytes_unref() start out allowing
> NULL might be more palatable).

Good plan.

> >  * I saw several locations where it would have been nice to have the SoupBuffer
> >    'temporary' stuff that Dan talks about above.
> 
> can you point to some examples?

For example where we pass in data to gcr_parser_parse_data(), which then gets put into a GBytes for handling. In the future we'll have a public API like, gcr_parser_parse_bytes().

This code path is used for parsing keys and other data placed in non-pageable memory, and we don't want it copied unnecessarily. So temporarily (until GBytes is finalized) in gcr_parser_parse_data() I'm just using a g_bytes_new_static() because I know nothing will hold a reference to that memory. But this is fragile, and only temporary.

Using a SOUP_MEMORY_TEMPORARY style thing would work nicely for this situation. Although it's quite a corner case. Normally you would copy the data when retrofitting GBytes into such an API.

> > > * g_bytes_unref() should probably take a GBytes * instead of a gpointer to
> > > conform to glib behavior?
> > 
> > Then it always needs to be cast to a GDestroyNotify when used that way. In my
> > migration, I used tons of g_bytes_new_with_free_func with g_bytes_unref as a
> > free_func.
> 
> But all of those places should be using g_bytes_new_from_bytes() anyway.

Yeah, I guess they should. In many cases in egg-asn1x.c I had a pointer and size, rather than an offset. But I should just calculate an offset and use g_bytes_new_from_bytes().
Comment 33 Benjamin Otte (Company) 2011-11-22 14:13:08 UTC
So, after disucssing the mess that is public function names vs glib internal function names and the layering violations (IMO) that this new GBytesMemoryType is - it exposes this fact about the linker in the API - I think it's better we remove g_bytes_try_steal_unref() from the public API for now. In particular because there was no need for that function in Stef's GCR port or my Swfdec work.

The only thing from Swfdec I can't do with GBytes is getting an eqivalent to swfdec_buffer_get_super() - see http://cgit.freedesktop.org/swfdec/swfdec/tree/swfdec/swfdec_buffer.c#n202 - but we agreed it's probably too much of a corner case to worry about it.

So my vote is on the patch from comment 29 with
- removing of g_bytes_try_steal_and_unref() from the public API
- the fix that is not in there to make g_bytes_unref() take a GBytes
Comment 34 Stef Walter 2011-11-22 16:35:32 UTC
Created attachment 201936 [details] [review]
Removed try_steal from public API, and made other minor changes outlined.

Removed the concept of GBytesMemoryType as well.

Does this look good to merge?
Comment 35 Stef Walter 2011-11-22 16:56:20 UTC
Created attachment 201938 [details] [review]
Now rebased on master.
Comment 36 Matthias Clasen 2011-11-22 18:15:08 UTC
Review of attachment 201938 [details] [review]:

I'm failing to see a typedef for GBytes in any public header here - don't we still need that ?
Comment 37 Matthias Clasen 2011-11-22 18:20:38 UTC
Oh, the typedef is in garray.h. Thats a little confusing, maybe.
Comment 38 Matthias Clasen 2011-11-22 18:37:25 UTC
Review of attachment 201938 [details] [review]:

::: glib/gbytes.c
@@ +76,3 @@
+ * bytes is no longer in use. Because of this @data must have been created by
+ * a call to g_malloc(), g_malloc0() or g_realloc() or by one of the many
+ * functions that wrap these calls (such as g_new(), g_strdup(), etc).

Might be worth adding a xref, like: 

For creating #GBytes with memory from other allocators, see g_bytes_new_with_free_func().

@@ +151,3 @@
+ * @bytes: a #GBytes
+ * @offset: offset which subsection starts at
+ * @length: length of subsucsection

Typo: subsucsection

@@ +153,3 @@
+ * @length: length of subsucsection
+ *
+ * Creates a #GBytes which is a subsection of another #GBytes.

Might be worth making it explicit that offset + length must not be larger than the size of @bytes.

@@ +220,3 @@
+ * Increase the reference count on @bytes.
+ *
+ * Returns: (transfer full): the #GBytes

Confusing annotation...isn't the entire purpose of ref() is to obtain (co-)ownership...

@@ +322,3 @@
+ *
+ * This function can be passed to g_tree_new() when using non-%NULL #GBytes
+ * pointers as keys in a #GTree.

Should also mention that this function can be used to sort GBytes' instances. The GTree reference seems a little obscure by comparison

@@ +343,3 @@
+  if (ret == 0 && b1->size != b2->size)
+      ret = b1->size < b2->size ? -1 : 1;
+  return ret;

As Benjamin pointed out, we should make the obvious efficiency fix here, and compare size first, before going into memcmp. That also allows you to get rid of the MIN there

@@ +424,3 @@
+ *
+ * As an optimization, the byte data is transferred to the array without copying
+ * if: this was the last reference to bytes and bytes was created with

the : after if seems extraneous.

::: glib/gbytes.h
@@ +57,3 @@
+ *
+ * Since: 2.32
+ **/

We generally keep SECTION docs at the top of the .c in GLib.
Comment 39 Stef Walter 2011-11-23 07:25:59 UTC
(In reply to comment #37)
> Oh, the typedef is in garray.h. Thats a little confusing, maybe.

Yeah. Do you want me to move both GByteArray and GBytes typedefs to another location? the gbytes.h and garray.h headers both depend on both typedefs.

(In reply to comment #38)
> Review of attachment 201938 [details] [review]:
> 
> ::: glib/gbytes.c
> @@ +76,3 @@
> + * bytes is no longer in use. Because of this @data must have been created by
> + * a call to g_malloc(), g_malloc0() or g_realloc() or by one of the many
> + * functions that wrap these calls (such as g_new(), g_strdup(), etc).
> 
> Might be worth adding a xref, like: 
> 
> For creating #GBytes with memory from other allocators, see
> g_bytes_new_with_free_func().

Done.

> @@ +151,3 @@
> + * @bytes: a #GBytes
> + * @offset: offset which subsection starts at
> + * @length: length of subsucsection
> 
> Typo: subsucsection

Fixed.

> @@ +153,3 @@
> + * @length: length of subsucsection
> + *
> + * Creates a #GBytes which is a subsection of another #GBytes.
> 
> Might be worth making it explicit that offset + length must not be larger than
> the size of @bytes.

Done.

> @@ +220,3 @@
> + * Increase the reference count on @bytes.
> + *
> + * Returns: (transfer full): the #GBytes
> 
> Confusing annotation...isn't the entire purpose of ref() is to obtain
> (co-)ownership...

Yeah, I always thought 'full' was confusing. Essentially this annotation means that the caller must free the returned reference. Although annotations on ref() and unref() functions probably aren't used by gobject-introspection anyway, so I'll remove them.

> @@ +322,3 @@
> + *
> + * This function can be passed to g_tree_new() when using non-%NULL #GBytes
> + * pointers as keys in a #GTree.
> 
> Should also mention that this function can be used to sort GBytes' instances.
> The GTree reference seems a little obscure by comparison

Done.

> @@ +343,3 @@
> +  if (ret == 0 && b1->size != b2->size)
> +      ret = b1->size < b2->size ? -1 : 1;
> +  return ret;
> 
> As Benjamin pointed out, we should make the obvious efficiency fix here, and
> compare size first, before going into memcmp. That also allows you to get rid
> of the MIN there

Then the sorting would be incorrect. Not all values with a smaller size sort before those with a longer size. Use g_bytes_equal() if you want the extra efficiency.

> @@ +424,3 @@
> + *
> + * As an optimization, the byte data is transferred to the array without
> copying
> + * if: this was the last reference to bytes and bytes was created with
> 
> the : after if seems extraneous.

Removed.

> ::: glib/gbytes.h
> @@ +57,3 @@
> + *
> + * Since: 2.32
> + **/
> 
> We generally keep SECTION docs at the top of the .c in GLib.

Moved.
Comment 40 Stef Walter 2011-11-23 07:26:21 UTC
Created attachment 201973 [details] [review]
Updated for Matthias' review.
Comment 41 Matthias Clasen 2011-11-23 16:07:26 UTC
(In reply to comment #39)
> (In reply to comment #37)
> > Oh, the typedef is in garray.h. Thats a little confusing, maybe.
> 
> Yeah. Do you want me to move both GByteArray and GBytes typedefs to another
> location? the gbytes.h and garray.h headers both depend on both typedefs.

Lets leave it there for now.

> 
> > @@ +343,3 @@
> > +  if (ret == 0 && b1->size != b2->size)
> > +      ret = b1->size < b2->size ? -1 : 1;
> > +  return ret;
> > 
> > As Benjamin pointed out, we should make the obvious efficiency fix here, and
> > compare size first, before going into memcmp. That also allows you to get rid
> > of the MIN there
> 
> Then the sorting would be incorrect. Not all values with a smaller size sort
> before those with a longer size. Use g_bytes_equal() if you want the extra
> efficiency.

Depends on what order you want, I'd say. You get a total linear either way.  
I'm fine with keeping this function as it is, as long as the docs say something
about the order that is being defined (just mentioning 'lexicographic order' may be good enough).

With that clarification, things look good to me.
Comment 42 Stef Walter 2011-11-24 08:37:03 UTC
Merged.